DVOS Data Object Library
Contents
Data Variable Object Scheme
The purpose of the new data
handling core (DVOS) is to remove performance bottlenecks inm
QDOS. At the same time it should simplify coding within QSAS
and plugins, and make the whole package easier to maintain
when it moves wholely into public domain maintenance.
Specifically, in contrast to QDOS, data objects are a single
class and not a
hierarchy of classes handling different data types and
dimensionality. Objects are still accessed through var
pointers, which allows for data to be self-destroying when out
of scope, but it is no longer necessary to narrow objects to
specific sub-classes (scalar, matrix, scalar sequence, matrix
sequence etc).
The single class handles all necessary data conversions and
dimensions transparently, which in turn means less coding in
the calling modules, and hence easier maintenance. The
reduction in layers of abstraction also make it easier for the
community to understand and maintain the core code itself. As
many operations as possible are handled by the DVOS core, and
this includes intelligent handling of fill values and
metadata. All operations supported are safe against the
underlying data types and invalid operations will do nothing
(rather than throw an exception), so calling a supported
method is always safe. The only exception to this are a few
explicit de-reference methods used for fast access directly
into the data valarrays. They are to be used only for looping
explicitly through the data when dimensionality and data type
has been checked. Safe (but slower) options for each of these
are also supported.
The data itself is held in a valarray, and the object knows
how to unpack the records and matrix dimensions appropriately.
This ensures that copying data and constructing objects that
are sequences of arrays is very much faster than in QDOS where
matrix sequences were sequences of var pointers to matrices.
Most methods and operators are handled at the top level by the
object so as to minimise the number of operations that require
stepping through the valarray data itself. Where possible
valarray supported methods are used to avoid explicit valarray
element access, and for this the fill values are set to NaN
for double data to ensure they remain unaltered by arithmetic
operations.
Dvar
'Pointers'
The Dvar template class provides a
wrapper for data objects that ensures they are deleted when they
go out of scope. They are not strictly pointers, but are a class
containing a pointer to the data object and a dereference
operator (->) defined such that they may be used like
pointers to the data object.
Every time a var pointer is created, the reference counter for
the target object is incremented, and whenever a Dvar object is
deleted (or goes out of scope) the reference counter is
decremented. When the counter is zero the data object is
deleted.
Explicitly the QSAS data variable object class, DvObject, may be
accessed via the Dvar pointer class for it, DvObject_var. From
here on we shall discuss the DvObject_var derived class. The
class contains a pointer to a DvObject class data object which
is nesver null.
There are three constructors:
DvObject_var ()
constructs an 'empty' DvObject (see the DvObject class) and
returns is_nil() as true.
DvObject_var (const
DvObject_var &T) constructs an object that points to the
same object that T points to. The object is not copied or
changed other than to increment its reference pointer.
DvObject_var ( DvObject
*P) constructs an object that points to the object P. The object
is not copied or changed other than to increment its reference
pointer.
There is an assignment operator
=, that takes a DvObject pointer as argument, and this
is used to assign a Dvar pointer to an object created with the new command. Additionally
there is a similar assignment operator
=, that takes a DvObject_var
reference as argument.
For example, where (...) refers to any argument list taken by
DvObject constructors:
DvObject_var
dobj; // creates a pointer to an 'empty' data object
(is_nil() returns true)
dobj =
new DvObject(...); // creates a new data object and points dobj to it
(deleting orphaned object automatically, e.g. 'empty' object
above)
DvObject_var
dobj2 = dobj; // creates a new Dvar pointer to the
object in dobj
DvObject_var
dobj3(dobj); // also creates a new Dvar pointer to the
object in dobj
dobj3
+= dobj; // adds
the content of dobj to that of dobj3 (and ensures units are the
same, converting a copy of dobj on the fly as needed)
Note that a new object is usually created using the assignment
operator (through the explicit 'new') and these take any of the constructors for
the DvObject class as
the RHS.
The ptr() method gives
direct access to the pointer to the DvObject
data object inside Dvar.
The de-reference operator
->, gives direct access to the methods
available in the DvObject data class, while the operator, *,
gives access to the data object itself. Hence dobj->seqSize() and (*dobj).seqSize() both
return the length of the data sequence provided by the DvObject
method seqSize().
The comparison operators
== and != compare the
data object_id values.
Each non-empty object has a unique id, which can be found from dobj->get_id(), and these
comparisons determine whether the Dvar pointers refer to exactly
the same target data object. To compare data values other
operators are provided in the DvObject data class, for example *dobj1 ==
*dobj2 compares the data arrays in each object.
'Empty' objects all have id = -1, and are therefore always equal
to each other.
The bool methods is_nil()
and is_ok() return
the equivalent methods from the data class, so dobj->is_nil()
and dobj.is_nil()
are equivalent. If there is data in the target data object, then
is_nil() is false and is_ok() is true, and vice versa. The
pointer to the enclosed data object is never actually nil, and
all methods can be called safely. Data object methods that are
not applicable to the underlying data type do nothing but attach an
error message to any returned object.
Note, however, that the data arrays inside the DvObject are
valarrays, and these do not do array bound checking. Under most
circumstances the DvObjects can perform whatever operations are
required through defined methods and operators, and only when
accessing data element by element will it be necessary to
determine the data type and valarray length. See the DvObject class.
A DvObject_var may be used as a return value from a method. It
may also be passed in an argument by value or reference.
If passed by value a new DvObject_var is created which points to
the same object target. Changes to this object affect the object
in the calling function, but if the DvObject_var is changed to
point to a new target, that change will not propagate back to the
calling function.
Passing by reference is slightly faster and the same pointer and
target are involved, and all
changes affect the calling function.
Most operators supported by
DvObject are
also available directly from DvObject_var for
convenience. Operators normally result in a new object being
created and assigned to the target
DvObject_var, the
exception being operators of the form +=, -=, /=, *= which
modify the target object directly and are thus slightly faster
than the equivalent binary operators, +, -, /, * which create a
new object.
DvObject
Class
Metadata
Utilities
Specific
utilities for handling specific useful metadata (CAA, CSDS
and ISTP attributes) are provided. These are part of the
DvObject class and used internally to ensure
operations are allowed, and to perform conversions and
metadata correction within operations. As such, when
DvObject operators are used it is not necessary to call
these methods explicitly as DvObject ensures metadata
compliance where possible.
DvObject_var getDep0() is simply a
call to get_xref(DEPEND_0).
bool hasTimeTags()
returns true if the Depend_0 xref points to a time type
object.
DvObject_var getTimeTags() returns
a var pointer to this object if it is a time object, else
Depend_0 if that is a time object else an
empty var pointer.
DvEvent getTimeRange() returns
the DvEvent corresponding to the start and stop time of the
time tags if they exist, otherwise the default DvEvent().
bool getDataRange(int recFrom,
int recTo, double &min, double &max)
returns the minimum and maximum value of the data between
the specefied records for data of type double. Return is
true if data is double and range found, else returns false.
bool getDataRange(double &min,
double &max) finds the minimum and maximum
value in the data object. Double and int data are just cast
to double, and time objects return the Epoch2000 values (see
DvTime). It returns true if successful, and false if the
object is nil or the data type is not handled.
DvString
getXrefText(DvString &name)
DvString getXrefText(const char
*name) both return a DvString holding the content of
the named attribute converted (if necessary) to text.
int get_iFILL() returns the
fill value as an integer.
Unit and SI_conversion handling utilities are used within
DvObject operators to ensure units are used and modified
correctly. There are three different unit systems that data
can be converted to/from. The SI_conversion string specifies
the units of the data object by means of a standard SI unit
string and a conversion factor that would convert the data
into the specified SI units (by multiplying the data by the
conversion factor). A third unit system is the Base SI units
that all recognised SI units can be converted into. Units and
conversions can be reduced to this Base SI form to ensure that
the underlying units are the same and that conversions are
between the same unit base.
DvString getSIC(int i=0)
gets the SI_conversion string as a DvString, and if i is
specified returns the ith element of the SI_Conversion
string matrix (e.g. for an rlp vector, although this is
rarely supported in data files).
DvString getBaseSI()
returns the SI_conversion string in base SI form.
double convFactor()
returns the numeric SI_conversion
factor to convert data into its SI form, that is the
numerical value preceding the '>'.
double convFactorToBaseSI()
returns the numeric SI_conversion
factor to convert data into its base SI form. This is often
the same as convFactor().
double convFactorFrom(DvString &SIC) returns
the numeric SI_conversion
factor to convert data from SIC units to this.
bool sameBaseSI(DvObject_var
&arg) returns true if this and arg
have equivalend base SI units ( e.g
returns true for V / km and nT m / s).
bool sameBaseSI(const char * SIstr)
returns true if this and an SI_Conversion string SIstr
have equivalend base SI units ( e.g
returns true for V / km and nT m / s).
bool sameUnits(DvObject_var
&arg) returns true if this and arg
have the same base SI units and conversion factors.
bool sameUnits(const char * SIstr, int i=0)
returns true if this and
an SI_Conversion string SIstr have
the same base SI units and conversion factors. If i is
specified the ith entry in the SI_Conversion string
matrix of this object is used (e.g. for rlp vectors).
DvString getUnitsProduct(DvObject &arg)
returns the concatenated UNITS string for this and arg.
DvString getUnitsRatio(DvObject &arg)
returns the concatenated UNITS string for this and 1
/ arg.
DvString getUnitsPower(double
p) returns the UNITS string for this raised
to the power p as "(unit)^p".
DvString getUnitsInverse(double p) returns
the UNITS string for 1 / this.
DvString getSICProduct(DvObject
&arg) returns the SI_conversion string for this
times arg.
DvString getSICRatio(DvObject &arg)
returns the SI_conversion string for this / arg.
DvString getSICInverse()
returns the SI_conversion
string for the inverse of this object.
DvString getSICPower(double p) returns the SI_conversion
string for this raised to the power p.
bool hasUnits() returns
true if this has any SI
units after reduction to Base SI form. Used to determine
whether it is valid to take log(), exp() etc. If
SI_Conversion is not found returns false.
All
the following coordinate system methods look for the CSDS
Frame attribute as well as CAA COORDINATE_SYSTEM, REPRESENTATION,
REPRESENTATION_1, and TENSOR_ORDER attributes.
DvString getFrameAttr()
returns the frame information in the CSDS syntax, e.g.
vector>gse_xyz.
DvString getFrame()
returns the coordinate system part of the frame, e.g. gse.
DvString getRep() returns
the representation string, e.g. xyz, rlp or rtp.
int getOrder() returns the
order of the object (0 for scalar or array, 1 for vector and 2
for rank 2 tensor.
bool sameFrame(DvObject_var &arg) returns true if
this and arg are measured in the same reference
frame.
bool sameFrame(DvString &frame) returns true if
this is measured in the same reference frame as frame
(in CSDS syntax, e.g. vector>gse_xyz).
DvString setFrameAttr(DvString
frameAttr) sets the FRAME, COORDINATE_SYSTEM, ORDER
and REPRESENTATION from the frameAttr string in CSDS syntax,
e.g. vector>gse_xyz.
DvString setFrameAttr(DvString
frame, DvString rep, int order) sets the FRAME,
COORDINATE_SYSTEM, ORDER and REPRESENTATION from the frame,
representation and order.
DvString setFrameProduct(DvObject_var &arg) sets the FRAME,
COORDINATE_SYSTEM, ORDER and REPRESENTATION appropriate after
this object has been multiplied by arg.
int getOrder() returns the
numeric value of the rank of a matrix. e.g. 3-vectors are rank
1.
bool isThreeVector()
returns true if the Frame attribute (if present) contains
vector, or (if absent) the array size is 3.
bool isVectorXYZ() returns
true if it is a cartesian three vector (uses getRep() and
isThreeVector()).
void
ensureVectorXYZ() converts this object into
cartesian representation. Does nothing if already cartesian.
bool
isDeg() returns true if this object is measured in
degrees (uses sameUnits("1>deg")).
bool isRad() returns true if this object is measured
in radians (uses sameUnits("1>rad")).
bool
isAngle() returns true if this object is measured in
either degrees or radians. It checks both the SI_Conversion
and Units attributes.
bool
isPhiAngle() returns true if this object is an
azimuthal angle (range > pi or '"phi" or "azimut" used in
describing it).
bool
isThetaAngle() returns true if this object is a
colatitude (theta) angle (from metadatat).
bool
isLatAngle() returns true if this object is a
latitude angle (from metadata).
DvString angleUnitStr
returns a DvString containing one of "deg", "rad" or "".
bool isVectDeg() returns true if any component of
this object is measured in degrees.
bool isVectRad() returns true if any
component of this object is measured in radians.
DvObject_var angleMod(int angMax, DvString rad_deg)
returns this object with its range converted from one
range to another depending on the value of angMax as
follows:
angMax
= 360 converts into the range 0, 360.
angMax = 180 converts into the range -180, 180.
angMax =
90 converts between the ranges 0, 180 and -90, 90 (the
algorithm is the same, so repeated application converts back).
bool okDims(DvObject_var &arg) returns true if
the arg array
dimension matches that of this
object, or the arg
array has one element.
bool sameDims(DvObject_var
&arg) returns true if this object and arg have arrays of identical dimensions.
bool conformalDims(DvObject_var &arg) returns
true if this object
is conformal for multiplication by arg. Returns true if either
object has arraySize() = 1, or last dimension of this is the same as first
dimension of arg.
bool squareMat() returns
true if this object
is a square matrix (2 dimensions and both the same).
Error
Handling
All
operators return a valid object (either the original object
or a default single value of appropriate type), but if this
happens an error message is appended to the returnd object's
error list. Error handling methods are:
void clearError()
empties the object's error list.
void error(const char*flag)
appends the message flag to the object's error list.
DvString getError(size_t i)
returns the error message at position i on the object's
error list. Default is the last error appended.
int nError() returns the
number of error messages on the object's error list. Returns
zero if no errors have been detected.
Sequence
and Joining Utilities
All DvObject
operators check arguments are joined and will perform a
default join automatically (if necessary) .
However, joining first may be appropriate if something other
than a default join is required, such as joining to regular
timetags or boxcar averaging.
These are the same utilities that
are used internally by the DvObject operators to ensure that
data are correctly joined. In the following methods the
DEPEND_0 variable may be either DvTime or double values. Data
gaps are never removed in default operations as DVOS operators
join on the fly if needed and binary operators require the
result to be of the same length as the target.
The
default gap size used for linear interpolation is 1.5 times
the target spacing, and gaps are linear filled by default
while ends use nearest neighbour.
The default boxcar uses a box of width twice the target
spacing, and a minimum of 3 points in the box. If fewer than
the minimum number of points fall inside the box, then the
result is a gap. The gap handling default is linear inside a
gap and nearest neighbour on the ends.
The Nearest Neighbour gap default is to use nearest neighbour
inside gaps and off the ends.
The default
gap and join options can be overridden by attaching suitable
xrefs first (see Join Options below) in which case care must
be taken when setting DV_GAP_OPTION to DV_REMOVE
as the result may have fewer records than the target. If
multiple objects are to be joined to the same target with
gaps removed, then MultiJoin()
should be used (see DvJoinList
class below) to ensure they share a common timeline with
gaps in all inputs removed. In all
cases, if gap handling is set to remove gaps, then the end
handling will also remove gaps.
DvObject_var Join(DvObject
&dobj, bool withXrefs) joins this
object onto the Depend_0 of dobj if it exists, otherwise it uses
dobj
itself as target. It can thus be called with either a data
object or DEPEND_0 object as argument. It uses the join option and gap
handling most suitable to the data type
and spacing. Linear join is used for numeric data types
(double, int, time) unless the target spacing is more than
twice the object's original DEPEND_0 spacing, in which case a
boxcar is used. Nearest neighbour is used for string and event
data types. The result is a var pointer to a new object
resulting from this
being joined onto target,
which must be a sequence of either DvTime or double values. If
withXrefs is true (default) the xrefs are attached and joined
as necessary.
The
defaults are safe for most plugin use, and the single method Join()
below is all that is needed.
DvObject_var Join(DvObject_var &dobj, bool
withXrefs) is a convenience call taking a
var pointer as argument. It calls Join(DvObject &dobj, bool
withXrefs) above.
bool isRegular(double
&spacing) returns true if the data are equally
spaced (DEPEND_0 object evenly spaced, e.g. regular time
tags). The argument spacing holds the value of this spacing on
return.
double get_spacing() returns
(usually) the minimum separation between values, e.g.
double
spacing = dobj->getDep0()->get_spacing();
This is assumed to be the nominal data spacing with data gaps
ignored. It retains the sign of the spacing, so a monotonic
decreasing series has a negative spacing (e.g. if Depend_0 is
t0 - t ). The algorithm sets the spacing to be the difference
between the first two entries in DEPEND_0, and then scans the
sequence and if an absolute separation smaller than the
absolute value of this spacing is found more than once, then
the second of these values is used as the new spacing. This
continues until all records are checked. This reduces the
likelihood of being confused by data glitches. Any spacing
less than 1.0e-20 is assumed to be a duplicate timetag and is
ignored.
DvInt8
get_nanoSpacing() as above, but only applies to time
sequences and the spacing is returned in nanoseconds.
bool isJoined(DvObject
&arg) uses the Depend_0 of arg as the
argument if it has one, otherwise takes arg itself as
the argument. Compares the id of the Depend_0 of this object
with the id of the argument. Returns true if the id values are
the same, otherwise if either has no Depend_0, then isJoined() returns true if
the sequence length of this object and the argument are the
same, or the argument has a sequence length of one. Otherwise
(id's differ and both have a DEPEND_0) isJoined() does a fuzzy
comparison with each value of the Depend_0 and the target and
returns true only if they are all within a small tolerence
(1.e-10, good to picoseconds), otherwise returns false.
bool same(DvObject
&arg) is the equivalent call to isJoined() except that this object and arg are themselves the
DEPEND_0 object. This is a convenience call when testing
against a target set of timetags (or scalar DEPEND_0), e.g. if(
dep0->same(target_tt) )...
DvObject_var
linearJoin(DvObject &target, bool withXrefs, DvMask *Gmsk=NULL) joins
this object onto the Depend_0 of dobj if it
exists, otherwise it uses dobj itself as target. It
can thus be called with either a data object or DEPEND_0
object as argument. It returns a var pointer
to a new DvObject containing this linearly joined onto the
target object which must be a sequence of either DvTime or
double values. The xrefs of the argument are attached to the
result (joined if necessary) if withXrefs is true (the
default). Options for gap width and gap handling may be set as
xrefs in the object prior to joining and propagate into the
resultant object for chaining operations (see Join Options
below). If the optional DvMask pointer is provided,
the DvMask must be created before linearJoin() is called, but
it will be resized inside linearJoin(). If provided it returns
a mask of the same length as the target with true for records
to be kept and false for those flagged as gaps. It is used by
MultiJoin. If not provided, then records are removed inside
linearJoin() as necessary if either gap or end handling is set
to DV_REMOVE.
DvObject_var
boxcarJoin(DvObject &target, bool withXrefs, DvMask *Gmsk=NULL) joins
this object onto the Depend_0 of dobj if it
exists, otherwise it uses dobj
itself as target. It can thus be called with either a data
object or DEPEND_0 object as argument. It returns
a var pointer to a new DvObject containing this object boxcar joined
onto the target object which must be a sequence of
either DvTime or double values. The xrefs
of the argument are attached to the result (joined if
necessary) if withXrefs is true (the default).
Options for gap width and gap handling may be set as xrefs in
the object prior to joining and propagate into the resultant
object for chaining operations (see Join
Options below). If the
optional DvMask pointer is provided, the DvMask must
be created before linearJoin() is called, but it will be
resized inside linearJoin(). If provided it returns a mask of
the same length as the target with true for records to be kept
and false for those flagged as gaps. It is used by MultiJoin.
If not provided, then records are removed inside linearJoin()
as necessary if either gap or end handling is set to
DV_REMOVE.
DvObject_var nnJoin(DvObject &target, bool withXrefs, DvMask *Gmsk=NULL)
oins
this object onto the Depend_0 of dobj if it
exists, otherwise it uses dobj
itself as target. It can thus be called with either a
data object or DEPEND_0 object as argument. It returns
a var pointer to a new DvObject containing this object nearest
neighbour joined onto the target object which must be
a sequence of either DvTime or double values. The xrefs
of the argument are attached to the result (joined if
necessary) if withXrefs is true (the default).
Options for gap width and gap handling may be set as xrefs in
the object prior to joining and propagate into the resultant
object for chaining operations (see Join
Options below). If the
optional DvMask pointer is provided, the DvMask must
be created before linearJoin() is called, but it will be
resized inside linearJoin(). If provided it returns a mask of
the same length as the target with true for records to be kept
and false for those flagged as gaps. It is used by MultiJoin.
If not provided, then records are removed inside linearJoin()
as necessary if either gap or end handling is set to
DV_REMOVE.
Note
that xrefs will be joined using the same method as the data
and this may not be appropriate for the data type. In rare
cases it may be necessary
to join certain xrefs separately and attach after joining the
data.
Join Options may be
added as xrefs to the DvObjects to be joined. If they are not
present defaults will be used.
The possibilities are:
DV_JOIN_METHOD can be DV_LINEAR, DV_NN or DV_BOXCAR,
and this choice is propagated into the resulting object, so it
may be specified at the start of a chain of operations, and
will override the default algorithm in Join().
DV_GAP_WIDTH
as a double is the gap tolerence for linear and nearest
neighbour joins. If this xref is not set the gap will be 1.5
times the target spacing.
DV_BOX_WIDTH
as a double is the boxcar width for a boxcar join. If this
xref is not set the boxcar width will be 2 times the target
spacing. For a boxcar any interval with too few points (see DV_MIN_BOXCAR)
will be treated as a gap, and DV_GAP_WIDTH is
ignored.
DV_MIN_BOXCAR
as an int is the minimum number of values acceptable to make
up the boxcar (for a boxcar join). The default is 3, and if
fewer records are found inside the box at a record, then that
record is taken to be a gap.
DV_GAP_OPTION
determines what join should do with gaps in the input data.
Options are DV_LINEAR, DV_NN, DV_REMOVE, DV_FILL (use
fillvalue) or DV_ZERO_FILL.
The
default option for boxcar join is DV_LINEAR
The
default option for linear join is DV_LINEAR
The default option for nearest neighbour join is DV_NN
DV_END_OPTION
determines what join should do when data start after or end
before the target. Options are
DV_LINEAR, DV_NN, DV_REMOVE, DV_FILL (use fillvalue) or
DV_ZERO_FILL.
The
default option for boxcar join is DV_NN
The
default option for linear join is DV_NN
The default option for nearest neighbour join is DV_NN
DvObject_var interpAt(DvTime t)
returns a var pointer with the value(s) of this object interpolated
at time t and converted to
double if possible (see asDouble() for conversions). Returns
0.0 if the object has no valid time tags. The returned
object will have type double, sequence length 1 and array
dimensions the same as the input object.
DvObject_var getCentres() returns a
var pointer holding the values of this as though they were
centred between Delta_plus and Delta_minus.
Usually this is applied to the timeline and returns the
centre time tags.
void applyLinearFill(DvMask &mask)
applies the mask to this object. For records where mask is
false the data value is linearly interpolated between the
neighbouring valid data points. It permits linear filling of
gaps for the boxcar where we require the interpolation to be
between the boxcar averaged values, and which must therefore
be done after the boxcar is complete. It is
for internal use.