|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectat.tuwien.ifs.somtoolbox.data.AbstractSOMLibSparseInputData
public abstract class AbstractSOMLibSparseInputData
This abstract implementation provides basic support for operating on a InputData
. Sub-classes have to
implement constructors and methods to read input vectors and create an InputData
object, for example by
reading from a file or a database.
Field Summary | |
---|---|
protected SOMLibClassInformation |
classInfo
Any class label information attached to the input vectors. |
protected String |
content_subtype
The specific subtype of content type (user-definable, for example "rp", "rh", or "ssd" for Rhythm Patterns, Rhythm Histograms or Statistical Spectrum Descriptor audio feature types). |
protected String |
content_type
The content type of the vectors ("text", "audio", ...). |
String[] |
dataNames
The label/name of the vector. |
protected int |
dim
The dimension of the input vectors, i.e. |
private double[][] |
distanceMatrix
A matrix containing the pairwise distances between two vectors. FIXME: use LeightWeightMemoryInputVectorDistanceMatrix instead |
protected static String |
ERROR_MESSAGE_FILE_FORMAT_CORRUPT
|
protected int |
featureMatrixCols
Column dimension of the feature matrix before having been vectorized to input vector. |
protected int |
featureMatrixRows
Row dimension of the feature matrix before having been vectorized to input vector. |
private double[][] |
intervals
|
protected boolean |
isNormalized
Indicates whether the input data has been normalised. |
protected cern.colt.matrix.impl.DenseDoubleMatrix1D |
meanVector
The mean of all the input vectors. |
protected double |
mqe0
|
protected LinkedHashMap<String,Integer> |
nameCache
A mapping from the name to the index of an input vector, for faster access. |
protected int |
numVectors
The number of vectors in this input data collection. |
protected Random |
rand
|
protected String |
source
Where this input data was read from, e.g. |
protected TemplateVector |
templateVector
A TemplateVector attached to this input data. |
private double[][] |
transformedVectors
A transformation of the input vectors. |
Fields inherited from interface at.tuwien.ifs.somtoolbox.data.InputData |
---|
inputFileNameSuffix, MISSING_VALUE |
Constructor Summary | |
---|---|
protected |
AbstractSOMLibSparseInputData()
|
protected |
AbstractSOMLibSparseInputData(boolean norm,
Random random)
|
protected |
AbstractSOMLibSparseInputData(String[] dataNames,
int dim,
boolean norm,
Random rand,
TemplateVector tv,
SOMLibClassInformation clsInfo)
|
Method Summary | |
---|---|
private boolean |
assertEqual(Object name,
Object i1,
Object i2)
|
SOMLibClassInformation |
classInformation()
Gets the class info associated with this input data. |
static AbstractSOMLibSparseInputData |
create(InputDatum[] inputData,
SOMLibClassInformation classInfo)
|
int |
dim()
Gets the dimension of the input data. |
boolean |
equals(Object obj)
|
InputDatum[] |
getByNameDistanceSorted(double[] vector,
Collection<String> inputNames,
DistanceMetric metric)
Retrieves the InputDatum corresponding to the given input names, and sorted by their distance to the
given vector. |
String |
getContentSubType()
Gets the content sub-type. |
String |
getContentType()
Gets the content type. |
double[][] |
getData()
Return the input data as a double array, i.e. |
double[][] |
getData(String className)
Returns the vectors of all inputs associated with the given class name |
double[][] |
getDataIntervals()
Return the min and max values for each feature, in a matrix of dim x 2 |
String |
getDataSource()
returns the name/URI/etc. |
double[][] |
getDistanceMatrix()
|
ArrayList<InputDistance> |
getDistances(int inputIndex,
DistanceMetric metric)
Returns the distances to the index of the given vector of the dataset. |
Hashtable<Integer,Integer> |
getFeatureDensities()
Returns feature densities statistics of the input data, namely a mapping from the number of input objects a specific feature is not zero in, to the total number of features with that density . |
int |
getFeatureMatrixColumns()
Gets the number of columns before vectorisation. |
int |
getFeatureMatrixRows()
Gets the number of rows before vectorisation. |
static String |
getFileNameSuffix()
|
static String |
getFormatName()
|
InputDatum |
getInputDatum(String label)
Get an input datum with a specified label. |
InputDatum[] |
getInputDatum(String[] labels)
Returns an array of input data with the specified labels. |
int |
getInputDatumIndex(String label)
|
String |
getLabel(int index)
Return the label of the input vector at the given index. |
String[] |
getLabels()
Returns an array containing the labels of all the input data. |
cern.colt.matrix.DoubleMatrix1D |
getMeanVector()
Gets the mean vector of the input vectors. |
cern.colt.matrix.DoubleMatrix1D |
getMeanVector(String[] labels)
Returns mean vector of specified vectors provided by String[] array. |
InputDatum[] |
getNearestN(double[] vector,
DistanceMetric metric,
int number)
Retrieves the given number of InputDatum that are closest to the given vector. |
InputDatum[] |
getNearestN(int inputIndex,
DistanceMetric metric,
int number)
Returns the n nearest input vectors for the index of the given vector of the dataset. |
InputDatum[] |
getNearestNUnsorted(int inputIndex,
DistanceMetric metric,
int number)
|
private InputDatum[] |
getNNearest(ArrayList<InputDistance> distances)
|
private InputDatum[] |
getNNearest(int number,
ArrayList<InputDistance> distances)
|
InputDatum |
getRandomInputDatum(int iteration,
int numIterations)
Gets a random input sample from the input data set. |
void |
initDistanceMatrix(DistanceMetric metric)
Calculates the distanceMatrix - careful, this is a lengthy process and should be done only if needed. |
boolean |
isNormalizedToUnitLength()
Indicates whether this data set has been normalised to the unit length. |
int |
numVectors()
Gives the size of this input data set. |
void |
setClassInfo(SOMLibClassInformation classInfo)
|
void |
setTemplateVector(TemplateVector templateVector)
Sets the template vector to be associated with this input data. |
TemplateVector |
templateVector()
Gets the template vector associated with this input data. |
void |
transformValues(DistanceMetric metric)
Calculates the matrix of transformedVectors using DistanceMetric.transformVector(double[]) of
the given metric. |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface at.tuwien.ifs.somtoolbox.data.InputData |
---|
getInputDatum, getInputVector, getValue, mqe0, subset |
Field Detail |
---|
protected static final String ERROR_MESSAGE_FILE_FORMAT_CORRUPT
protected String source
protected SOMLibClassInformation classInfo
public String[] dataNames
protected String content_type
An input file should use the following header format for content types:
$DATA_TYPE text
or
$DATA_TYPE audio-rp
protected String content_subtype
protected int featureMatrixRows
protected int featureMatrixCols
protected int dim
protected boolean isNormalized
protected cern.colt.matrix.impl.DenseDoubleMatrix1D meanVector
protected double mqe0
protected int numVectors
protected Random rand
protected TemplateVector templateVector
TemplateVector
attached to this input data.
private double[][] transformedVectors
private double[][] distanceMatrix
LeightWeightMemoryInputVectorDistanceMatrix
instead
protected LinkedHashMap<String,Integer> nameCache
private double[][] intervals
Constructor Detail |
---|
protected AbstractSOMLibSparseInputData(String[] dataNames, int dim, boolean norm, Random rand, TemplateVector tv, SOMLibClassInformation clsInfo)
protected AbstractSOMLibSparseInputData(boolean norm, Random random)
protected AbstractSOMLibSparseInputData()
Method Detail |
---|
public int dim()
InputData
dim
in interface InputData
public String getContentType()
InputData
getContentType
in interface InputData
public String getContentSubType()
InputData
getContentSubType
in interface InputData
public int getFeatureMatrixRows()
InputData
getFeatureMatrixRows
in interface InputData
public int getFeatureMatrixColumns()
InputData
getFeatureMatrixColumns
in interface InputData
public cern.colt.matrix.DoubleMatrix1D getMeanVector()
InputData
getMeanVector
in interface InputData
public cern.colt.matrix.DoubleMatrix1D getMeanVector(String[] labels)
InputData
getMeanVector
in interface InputData
labels
- label names of the input data.
public boolean isNormalizedToUnitLength()
InputData
isNormalizedToUnitLength
in interface InputData
public int numVectors()
InputData
numVectors
in interface InputData
public TemplateVector templateVector()
InputData
templateVector
in interface InputData
public SOMLibClassInformation classInformation()
InputData
classInformation
in interface InputData
public void setTemplateVector(TemplateVector templateVector)
InputData
setTemplateVector
in interface InputData
templateVector
- the new template vector.public InputDatum getInputDatum(String label)
InputData
getInputDatum
in interface InputData
label
- the name of the input datum.
public int getInputDatumIndex(String label)
public InputDatum getRandomInputDatum(int iteration, int numIterations)
InputData
getRandomInputDatum
in interface InputData
public InputDatum[] getInputDatum(String[] labels)
InputData
getInputDatum
in interface InputData
labels
- the labels of the input data.
public void transformValues(DistanceMetric metric)
transformedVectors
using DistanceMetric.transformVector(double[])
of
the given metric.
metric
- the metric to be used to transform the values.public void initDistanceMatrix(DistanceMetric metric) throws MetricException
distanceMatrix
- careful, this is a lengthy process and should be done only if needed.
Requires the matrix of transformedVectors
being initialised (e.g. via
transformValues(DistanceMetric)
).
metric
- the metric to use for calculating the distances.
MetricException
- if DistanceMetric.distance(double[], double[])
encounters a problem.public InputDatum[] getNearestN(int inputIndex, DistanceMetric metric, int number) throws MetricException
inputIndex
- the index of the vector.metric
- the metric to use for the distance comparison. Only used when the distanceMatrix
is not
pre-calculated.number
- the number of nearest input vectors desired.
MetricException
- if DistanceMetric.distance(DoubleMatrix1D, double[])
encounters a problem.public ArrayList<InputDistance> getDistances(int inputIndex, DistanceMetric metric) throws MetricException
inputIndex
- the index of the vector.metric
- the metric to use for the distance comparison. Only used when the distanceMatrix
is not
pre-calculated.
MetricException
- if DistanceMetric.distance(DoubleMatrix1D, double[])
encounters a problem.private InputDatum[] getNNearest(ArrayList<InputDistance> distances)
private InputDatum[] getNNearest(int number, ArrayList<InputDistance> distances)
public InputDatum[] getNearestNUnsorted(int inputIndex, DistanceMetric metric, int number) throws MetricException
MetricException
public InputDatum[] getNearestN(double[] vector, DistanceMetric metric, int number) throws MetricException
InputDatum
that are closest to the given vector.
MetricException
public InputDatum[] getByNameDistanceSorted(double[] vector, Collection<String> inputNames, DistanceMetric metric) throws MetricException
InputDatum
corresponding to the given input names, and sorted by their distance to the
given vector.
MetricException
public double[][] getData()
InputData
getData
in interface InputData
public double[][] getData(String className) throws SOMToolboxException
InputData
getData
in interface InputData
SOMToolboxException
- If no class information file is loadedpublic void setClassInfo(SOMLibClassInformation classInfo)
setClassInfo
in interface InputData
public double[][] getDistanceMatrix()
public double[][] getDataIntervals()
InputData
getDataIntervals
in interface InputData
public Hashtable<Integer,Integer> getFeatureDensities()
public String[] getLabels()
InputData
getLabels
in interface InputData
public String getLabel(int index)
InputData
getLabel
in interface InputData
public boolean equals(Object obj)
equals
in class Object
private boolean assertEqual(Object name, Object i1, Object i2)
public static AbstractSOMLibSparseInputData create(InputDatum[] inputData, SOMLibClassInformation classInfo)
public static String getFormatName()
public static String getFileNameSuffix()
public String getDataSource()
InputData
getDataSource
in interface InputData
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |