at.tuwien.ifs.somtoolbox.data
Interface InputData

All Known Implementing Classes:
AbstractNormaliser, AbstractSOMLibSparseInputData, ARFFFormatInputData, DataBaseSOMLibSparseInputData, ESOMInputData, MarsyasARFFInputData, MinMaxNormaliser, RandomAccessFileSOMLibInputData, SimpleMatrixInputData, SOMLibSparseInputData, SOMLibSparseInputDataNames, SOMPAKInputData, StandardScoreNormaliser, UnitLengthNormaliser, VectorFile2DatabaseImporter.InputVectorImporter, VectorFileToRandomAccessFileConverter

public interface InputData

The InputData provides the input vectors to be used for the training process of a Self-Organizing Map. The data structure to read construct an InputData from is normally generated by a parser or vector generator program.

Version:
$Id: InputData.java 3589 2010-05-21 10:42:01Z mayer $
Author:
Michael Dittenbach, Rudolf Mayer

Field Summary
static String inputFileNameSuffix
           
static double MISSING_VALUE
           
 
Method Summary
 SOMLibClassInformation classInformation()
          Gets the class info associated with this input data.
 int dim()
          Gets the dimension of the input data.
 String getContentSubType()
          Gets the content sub-type.
 String getContentType()
          Gets the content type.
 double[][] getData()
          Return the input data as a double array, i.e.
 double[][] getData(String className)
          Returns the vectors of all inputs associated with the given class name
 double[][] getDataIntervals()
          Return the min and max values for each feature, in a matrix of dim x 2
 String getDataSource()
          returns the name/URI/etc.
 int getFeatureMatrixColumns()
          Gets the number of columns before vectorisation.
 int getFeatureMatrixRows()
          Gets the number of rows before vectorisation.
 InputDatum getInputDatum(int d)
          Get an input datum with a specified index.
 InputDatum getInputDatum(String label)
          Get an input datum with a specified label.
 InputDatum[] getInputDatum(String[] labels)
          Returns an array of input data with the specified labels.
 double[] getInputVector(int d)
          Get the vector for the input datum of the specified index
 String getLabel(int index)
          Return the label of the input vector at the given index.
 String[] getLabels()
          Returns an array containing the labels of all the input data.
 cern.colt.matrix.DoubleMatrix1D getMeanVector()
          Gets the mean vector of the input vectors.
 cern.colt.matrix.DoubleMatrix1D getMeanVector(String[] labels)
          Returns mean vector of specified vectors provided by String[] array.
 InputDatum getRandomInputDatum(int iteration, int numIterations)
          Gets a random input sample from the input data set.
 double getValue(int x, int y)
          Returns the value of the y-th feature of input vector x.
 boolean isNormalizedToUnitLength()
          Indicates whether this data set has been normalised to the unit length.
 double mqe0(DistanceMetric metric)
          Calculates the mean quantisation error of the top-level unit.
 int numVectors()
          Gives the size of this input data set.
 void setClassInfo(SOMLibClassInformation classInfo)
           
 void setTemplateVector(TemplateVector templateVector)
          Sets the template vector to be associated with this input data.
 InputData subset(String[] names)
          Gets a subset of this input data set.
 TemplateVector templateVector()
          Gets the template vector associated with this input data.
 

Field Detail

MISSING_VALUE

static final double MISSING_VALUE
See Also:
Constant Field Values

inputFileNameSuffix

static final String inputFileNameSuffix
See Also:
Constant Field Values
Method Detail

isNormalizedToUnitLength

boolean isNormalizedToUnitLength()
Indicates whether this data set has been normalised to the unit length.

Returns:
true if this data set is normalised, false otherwise.

dim

int dim()
Gets the dimension of the input data.

Returns:
the dimension.

numVectors

int numVectors()
Gives the size of this input data set.

Returns:
the number of vectors.

getRandomInputDatum

InputDatum getRandomInputDatum(int iteration,
                               int numIterations)
Gets a random input sample from the input data set.

Returns:
the random input data.

getInputDatum

InputDatum getInputDatum(int d)
Get an input datum with a specified index.

Parameters:
d - the index of the input datum.
Returns:
the input datum.

getInputVector

double[] getInputVector(int d)
Get the vector for the input datum of the specified index


getInputDatum

InputDatum getInputDatum(String label)
Get an input datum with a specified label.

Parameters:
label - the name of the input datum.
Returns:
the input datum.

getInputDatum

InputDatum[] getInputDatum(String[] labels)
Returns an array of input data with the specified labels.

Parameters:
labels - the labels of the input data.
Returns:
the input data.

getLabels

String[] getLabels()
Returns an array containing the labels of all the input data.


getLabel

String getLabel(int index)
Return the label of the input vector at the given index.


getMeanVector

cern.colt.matrix.DoubleMatrix1D getMeanVector()
Gets the mean vector of the input vectors.

Returns:
the mean vector.

getMeanVector

cern.colt.matrix.DoubleMatrix1D getMeanVector(String[] labels)
Returns mean vector of specified vectors provided by String[] array.

Parameters:
labels - label names of the input data.
Returns:
the mean vector.

templateVector

TemplateVector templateVector()
Gets the template vector associated with this input data.

Returns:
the template vector, or null if the template vector was not specified.

classInformation

SOMLibClassInformation classInformation()
Gets the class info associated with this input data.

Returns:
the class info, or null if the class info file was not specified.

setTemplateVector

void setTemplateVector(TemplateVector templateVector)
Sets the template vector to be associated with this input data.

Parameters:
templateVector - the new template vector.

mqe0

double mqe0(DistanceMetric metric)
Calculates the mean quantisation error of the top-level unit.

Parameters:
metric - the metric to use for distance calculation.
Returns:
the mqe0.

subset

InputData subset(String[] names)
Gets a subset of this input data set. The input data in the subset are identified by the specified labels.

Parameters:
names - the label names of the desired subset data.
Returns:
a subset of the data.

getData

double[][] getData()
Return the input data as a double array, i.e. a matrix of numVectors x dim


getDataIntervals

double[][] getDataIntervals()
Return the min and max values for each feature, in a matrix of dim x 2


getValue

double getValue(int x,
                int y)
Returns the value of the y-th feature of input vector x.


getFeatureMatrixRows

int getFeatureMatrixRows()
Gets the number of rows before vectorisation.

Returns:
the number of rows of feature matrix before having been vectorized to input vector, or -1 if not available.

getFeatureMatrixColumns

int getFeatureMatrixColumns()
Gets the number of columns before vectorisation.

Returns:
the number of columns of feature matrix before having been vectorized to input vector, or -1 if not available.

getContentType

String getContentType()
Gets the content type.

Returns:
the content type

getContentSubType

String getContentSubType()
Gets the content sub-type.

Returns:
the content sub-type

setClassInfo

void setClassInfo(SOMLibClassInformation classInfo)

getData

double[][] getData(String className)
                   throws SOMToolboxException
Returns the vectors of all inputs associated with the given class name

Throws:
SOMToolboxException - If no class information file is loaded

getDataSource

String getDataSource()
returns the name/URI/etc. of the source where this input data was read from