Scatter Plot Description

How to use this page

This web page allows users to create scatter plots and, optionally, linear regression parameters for any user-selected pair of parameters of the hourly or high resolution OMNI data set. That is, users may obtain an equation of the form Y = aX + b and a corresponding cross correlation coefficient, where X and Y are user selected from the list of OMNI parameters.

Users specify a time span and time resolution over which to do the analysis. Users may choose to exclude from the analysis any points for which Y values and/or X values lie outside user-specified range(s). In fact, users can exclude points for which any OMNI parameter(s) lie outside user-specified range(s).

Update added November 2009. Until now, the scatter plots and regression fits enabled by this interface were for concurrent data. The descriptor of the basic functionality follows this Update marked by ******'s.

*************
We have just added functionalities to perform regression fits and scatter plots where the Y variable is time-lagged relative to the X variable by a single lag of the duration of N data points, and to compute auto- and cross-correlation coefficients for (X,X) or (Y,X) for multiple lags from zero to Nmax. Referring to a step size M and an Nmax, correlation coefficients will be computed for lags of 0, M, 2M, 3M, ..., N* where N* is [M * Integer (Nmax/M)]. Thus if Nmax is 7 and M is 2, correlation coefficients will be computed for lags of 0, 2, 4, 6 data points. Currently, one must use N (or Nmax for multiple lag case) between zero (the no-lag case) and 60. Likewise one must use M such that the number of correlation coefficients computed (including lag = 0) is 11 or less. This means M > Int [(Nmax-1)/10], or equivalently N/M <= 10.

Owing to the lack of ambiguity between the single-lag case and the multiple-lag case, we use in the interface just "N" to mean N for the single lag case and to mean Nmax for the multiple lag case.

Users must first specify a value of N, then specify whether they want (a) full analysis (scatter plot, regression fit, correlation coefficient) for a single lag or (b) merely a listing of points selected for single-lag analysis or (c) correlation coefficients for multiple lags. Also, for the multiple-lag case, a value of M needs to be specified.

In doing filtering, discussed below, parameter values are all considered at the time of the X variable, except for Y itself which is considered at the lagged time.
*************

Users must specify on the interface which parameters are Y and X, and can exclude points based on ranges of these or any other parameters simply by specifying low and/or high values on the interface line(s) for that (those) parameter(s).

Users may also choose between linear regression methods. The "delta-Y" method determines a and b by minimizing the sum of squares of (Y(observed) - Y(fit)); this is typically used in regression analyses when the likely errors in the Y parameter significantly exceed those in the X parameter. On the other hand the "perpdist" method equivalently minimizes the squares of perpendicular distances between observed (Y,X) points and the best fit line. In its past cross-comparisons of like parameters from multiple data sets contributing to OMNI, NSSDC has used the perpdist method. See References below.

Which method is preferable depends on availability of standard deviations of parameters being analyzed. Note that the only hourly (or daily or 27-day) OMNI parameters having standard deviations included in the OMNI records are IMF magnitude, GSE cartesian components of the IMF and plasma density, flow speed, temperature, flow direction angles and alpha/proton ratio. For High Resolution OMNI, only IMF magnitude has a standard deviation. The delta-Y linear regression routine uses these, when available, as the uncertainties in the Y parameter. The "perpdist" routine uses them, when available, in both the Y and X parameters. When not available, uncertainty = 1.0 is used in these routines. When doing a regression between one parameter having standard deviations and another not having standard deviations, one should use the delta-Y method and should let the parameter having standard deviations be the Y variable. When doing regressions between parameters both having standard deviations, the perpdist method may be preferable. Finally, when doing a regression between parameters neither of which has standard deviations, the user may want to use the delta-Y method twice (y=a+bx and x=a'+b'y, convert the second to y=(-a'/b')+(1/b')x and finally let y=.5*[(a-a'/b') + (b+(1/b'))*x]

Note that analyses are limited to 30,000 points when using the delta-Y method and to 12,000 points when using the perpdist method. These limits do not apply to runs involving no regression analysis (i.e. scatter plots only).

As a separate functionality, the user may "retrieve" all the points which were selected for the scatter plot and regression fit calculation, or may retrieve points independent of doing a scatter plot and fit. This latter capability is an enhancement to the basic OMNIWeb data retrieval functionality in that it allows filtering by values of selected or other parameters.

References:
Delta-Y method:
Program - CORRELATE.PRO from:
Research Systems, Inc., Interactive Data Language, Version 5.3

Perpdist method:
Program - fitexy.for from:
Chapter 15.3 in: Press, W. H., S. A. Teukolsky, W. T. Vetterling, 
and B. P. Flannery,
Numerical recipes in FORTRAN, Second Edition, Cambridge
University Press,
New York, 963 pp., 1992.

If you have any questions/comments about this service contact: Dr. Natalia Papitashvili, Space Physics Data Facility Mail Code 672, NASA/Goddard Space Flight Center, Greenbelt, MD 20771

NASA Official: Robert Candey (Robert.M.Candey@nasa.gov), Head of the Space Physics Data Facility