at.tuwien.ifs.somtoolbox.data
Class SOMLibSparseInputData

java.lang.Object
  extended by at.tuwien.ifs.somtoolbox.data.AbstractSOMLibSparseInputData
      extended by at.tuwien.ifs.somtoolbox.data.SOMLibSparseInputData
All Implemented Interfaces:
InputData
Direct Known Subclasses:
AbstractNormaliser, ARFFFormatInputData, ESOMInputData, SOMLibSparseInputDataNames, SOMPAKInputData, VectorFile2DatabaseImporter.InputVectorImporter, VectorFileToRandomAccessFileConverter

public class SOMLibSparseInputData
extends AbstractSOMLibSparseInputData

Implements InputData based on a SOMLib Input Vector File.

Version:
$Id: SOMLibSparseInputData.java 3971 2010-12-15 13:18:39Z mayer $
Author:
Michael Dittenbach

Field Summary
private  boolean containsMissingValues
           
protected  cern.colt.matrix.DoubleMatrix2D data
          The actual data.
static boolean DEFAULT_NORMALISED
           
static int DEFAULT_NUM_CACHE_BLOCKS
           
static int DEFAULT_RANDOM_SEED
           
static boolean DEFAULT_SPARSE
           
static String INPUT_VECTOR_FILE_FORMAT_CORRUPT_MESSAGE
           
protected  int[] nonZeros
          Counts how many of the feature values are not zero; stores an int value for each vector in the input data.
protected  boolean sparse
           
private  int ydim
           
 
Fields inherited from class at.tuwien.ifs.somtoolbox.data.AbstractSOMLibSparseInputData
classInfo, content_subtype, content_type, dataNames, dim, ERROR_MESSAGE_FILE_FORMAT_CORRUPT, featureMatrixCols, featureMatrixRows, isNormalized, meanVector, mqe0, nameCache, numVectors, rand, source, templateVector
 
Fields inherited from interface at.tuwien.ifs.somtoolbox.data.InputData
inputFileNameSuffix, MISSING_VALUE
 
Constructor Summary
protected SOMLibSparseInputData()
           
protected SOMLibSparseInputData(cern.colt.matrix.DoubleMatrix2D data, String[] dataNames, boolean norm, Random rand, TemplateVector tv, SOMLibClassInformation clsInfo)
          Constructor intended for subset generation.
  SOMLibSparseInputData(InputDatum[] inputData, SOMLibClassInformation classInfo)
          Constructor intended for generated synthetic data.
  SOMLibSparseInputData(String vectorFileName)
          Uses default values for sparsity (true), normalisation (true), chacheblocks ( 1) and seed (7 ).
  SOMLibSparseInputData(String vectorFileName, boolean sparse, boolean norm, int numCacheBlocks, long seed)
           
  SOMLibSparseInputData(String vectorFileName, String templateFileName)
           
  SOMLibSparseInputData(String vectorFileName, String templateFileName, boolean sparse, boolean norm, int numCacheBlocks, long seed)
           
  SOMLibSparseInputData(String vectorFileName, String templateFileName, String classInfoFileName)
           
  SOMLibSparseInputData(String vectorFileName, String templateFileName, String classInfoFileName, boolean sparse, boolean norm, int numCacheBlocks, long seed)
           
 
Method Summary
protected  void addInstance(int index, String label)
           
static long getDimensionality(String vectorFileName)
           
 InputDatum getInputDatum(int index)
          Get an input datum with a specified index.
 double[] getInputVector(int d)
          Get the vector for the input datum of the specified index
 double getValue(int x, int y)
          Returns the value of the y-th feature of input vector x.
 void init(boolean sparse, boolean norm, long seed)
           
protected  void initDataStructures(boolean sparse)
           
private  void initFromExistingData(cern.colt.matrix.DoubleMatrix2D data, String[] dataNames, boolean norm, Random rand, TemplateVector tv, SOMLibClassInformation clsInfo)
           
protected  void initMatrix(boolean sparse)
           
static void main(String[] args)
          Method for stand-alone execution, prints useful information about the input data.
 double mqe0(DistanceMetric metric)
          Calculates the mean quantisation error of the top-level unit.
protected static BufferedReader openFile(String vectorFileName)
           
protected  double parseDouble(String s)
           
protected  void processLine(int index, String[] lineElements)
          Process a single line of the input vector file.
protected  void readVectorFile(String vectorFileName, boolean sparse)
          Reads the input data from the given file, which has to follow the Input Vector File specification.
 void setLabel(int index, String name)
           
protected  void setMatrixValue(int row, int column, double value)
           
 InputData subset(String[] names)
          Gets a subset of this input data set.
 
Methods inherited from class at.tuwien.ifs.somtoolbox.data.AbstractSOMLibSparseInputData
classInformation, create, dim, equals, getByNameDistanceSorted, getContentSubType, getContentType, getData, getData, getDataIntervals, getDataSource, getDistanceMatrix, getDistances, getFeatureDensities, getFeatureMatrixColumns, getFeatureMatrixRows, getFileNameSuffix, getFormatName, getInputDatum, getInputDatum, getInputDatumIndex, getLabel, getLabels, getMeanVector, getMeanVector, getNearestN, getNearestN, getNearestNUnsorted, getRandomInputDatum, initDistanceMatrix, isNormalizedToUnitLength, numVectors, setClassInfo, setTemplateVector, templateVector, transformValues
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INPUT_VECTOR_FILE_FORMAT_CORRUPT_MESSAGE

public static final String INPUT_VECTOR_FILE_FORMAT_CORRUPT_MESSAGE
See Also:
Constant Field Values

DEFAULT_NORMALISED

public static final boolean DEFAULT_NORMALISED
See Also:
Constant Field Values

DEFAULT_NUM_CACHE_BLOCKS

public static final int DEFAULT_NUM_CACHE_BLOCKS
See Also:
Constant Field Values

DEFAULT_RANDOM_SEED

public static final int DEFAULT_RANDOM_SEED
See Also:
Constant Field Values

DEFAULT_SPARSE

public static final boolean DEFAULT_SPARSE
See Also:
Constant Field Values

containsMissingValues

private boolean containsMissingValues

nonZeros

protected int[] nonZeros
Counts how many of the feature values are not zero; stores an int value for each vector in the input data.


sparse

protected boolean sparse

data

protected cern.colt.matrix.DoubleMatrix2D data
The actual data. Each row in the matrix represents one vector.


ydim

private int ydim
Constructor Detail

SOMLibSparseInputData

public SOMLibSparseInputData(InputDatum[] inputData,
                             SOMLibClassInformation classInfo)
Constructor intended for generated synthetic data.


SOMLibSparseInputData

protected SOMLibSparseInputData(cern.colt.matrix.DoubleMatrix2D data,
                                String[] dataNames,
                                boolean norm,
                                Random rand,
                                TemplateVector tv,
                                SOMLibClassInformation clsInfo)
Constructor intended for subset generation.


SOMLibSparseInputData

public SOMLibSparseInputData(String vectorFileName)
Uses default values for sparsity (true), normalisation (true), chacheblocks ( 1) and seed (7 ).


SOMLibSparseInputData

public SOMLibSparseInputData(String vectorFileName,
                             boolean sparse,
                             boolean norm,
                             int numCacheBlocks,
                             long seed)

SOMLibSparseInputData

public SOMLibSparseInputData(String vectorFileName,
                             String templateFileName)

SOMLibSparseInputData

public SOMLibSparseInputData(String vectorFileName,
                             String templateFileName,
                             boolean sparse,
                             boolean norm,
                             int numCacheBlocks,
                             long seed)

SOMLibSparseInputData

public SOMLibSparseInputData(String vectorFileName,
                             String templateFileName,
                             String classInfoFileName)
                      throws SOMToolboxException
Throws:
SOMToolboxException

SOMLibSparseInputData

public SOMLibSparseInputData(String vectorFileName,
                             String templateFileName,
                             String classInfoFileName,
                             boolean sparse,
                             boolean norm,
                             int numCacheBlocks,
                             long seed)
                      throws SOMToolboxException
Throws:
SOMToolboxException

SOMLibSparseInputData

protected SOMLibSparseInputData()
Method Detail

initFromExistingData

private void initFromExistingData(cern.colt.matrix.DoubleMatrix2D data,
                                  String[] dataNames,
                                  boolean norm,
                                  Random rand,
                                  TemplateVector tv,
                                  SOMLibClassInformation clsInfo)

init

public void init(boolean sparse,
                 boolean norm,
                 long seed)

getInputDatum

public InputDatum getInputDatum(int index)
Description copied from interface: InputData
Get an input datum with a specified index.

Parameters:
index - the index of the input datum.
Returns:
the input datum.

getInputVector

public double[] getInputVector(int d)
Description copied from interface: InputData
Get the vector for the input datum of the specified index


getValue

public double getValue(int x,
                       int y)
Description copied from interface: InputData
Returns the value of the y-th feature of input vector x.


mqe0

public double mqe0(DistanceMetric metric)
Description copied from interface: InputData
Calculates the mean quantisation error of the top-level unit.

Parameters:
metric - the metric to use for distance calculation.
Returns:
the mqe0.

readVectorFile

protected void readVectorFile(String vectorFileName,
                              boolean sparse)
Reads the input data from the given file, which has to follow the Input Vector File specification. Additionally calculates the AbstractSOMLibSparseInputData.meanVector and creates the AbstractSOMLibSparseInputData.nameCache for faster index search.

Parameters:
vectorFileName - the name of the input vector file.

initDataStructures

protected void initDataStructures(boolean sparse)

initMatrix

protected void initMatrix(boolean sparse)

openFile

protected static BufferedReader openFile(String vectorFileName)

processLine

protected void processLine(int index,
                           String[] lineElements)
                    throws Exception
Process a single line of the input vector file.

Parameters:
index - the line index
lineElements - the line elements, split by the delimeters
Throws:
Exception

parseDouble

protected double parseDouble(String s)

setMatrixValue

protected void setMatrixValue(int row,
                              int column,
                              double value)

addInstance

protected void addInstance(int index,
                           String label)

subset

public InputData subset(String[] names)
Description copied from interface: InputData
Gets a subset of this input data set. The input data in the subset are identified by the specified labels.

Parameters:
names - the label names of the desired subset data.
Returns:
a subset of the data.

main

public static void main(String[] args)
                 throws Exception
Method for stand-alone execution, prints useful information about the input data.

Throws:
Exception

getDimensionality

public static long getDimensionality(String vectorFileName)

setLabel

public void setLabel(int index,
                     String name)