|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectat.tuwien.ifs.somtoolbox.reportgenerator.DatasetInformation
public class DatasetInformation
FIXME: most probably all the methods in this class should be part of InputData
and
SOMLibClassInformation
, respectively !
this class collects all available information about the values in the input dataset, like from the input file, the
template vector file, ... and maybe computes some properties of its own. It's job is to give one centralized placed
where the actual report generators (the output object) can ask for the data.
Field Summary | |
---|---|
private SOMLibClassInformation |
classInfo
|
private String |
classInformationFilename
|
private String[] |
classNames
|
(package private) boolean |
denseData
|
private boolean[] |
discrete
only an estimation - we call values discrete if they are integer values |
static int |
DISCRETE
|
private EditableReportProperties |
EP
|
private InputData |
inputData
|
private String |
inputDataFilename
|
private TemplateVector |
inputTemplate
|
private double[] |
max
holds for each dimension the maximal value |
static int |
MAX_VALUE
|
private double[] |
mean
holds for each dimension the mean value |
static int |
MEAN_VALUE
|
private double[] |
min
holds for each dimension the minimal value |
static int |
MIN_VALUE
|
private boolean[] |
only01
we check whether there are values != 0 or 1 |
static int |
ONLY01
|
private Vector<Integer> |
selectedIndices
|
private String |
tvFilename
|
private double[] |
var
holds for each dimension the variance |
static int |
VAR_VALUE
|
static int |
ZERO_VALUE
|
private int[] |
zeroValues
holds for each dimension the number of 0 - values. |
Constructor Summary | |
---|---|
DatasetInformation(Vector<Integer> selectedIndices,
String inputDataFilename,
String tvFilename,
String classInformationFile,
EditableReportProperties EP)
creates a new object storing information about a given dataset |
|
DatasetInformation(Vector<Integer> selectedIndices,
String inputDataFilename,
String tvFilename,
String classInformationFile,
EditableReportProperties EP,
CommonSOMViewerStateData state)
|
Method Summary | |
---|---|
private static String |
applyNameFix(String target)
small helper method for getTrainingDataInfo |
double |
calculateAccumulatedVariance()
this method is just a small helper method, used to display the Dimensions in the top-part of the output document It accumulates the Variances and calculates this Percentage from the total Variance |
private void |
checkDatatypes()
runs over all dimensions of the input vectors and tries to fetch some information about their data ranges and other properties information gathered are: min and max value within each dimension (this.min, this.max) does a dimension contain only 0/1 values (this.only01) does a dimension contain only plain integer values (this.discrete) how many 0 (=missing?) values are in each dimension (this.zeroValues) the results are stored in the appropriate arrays |
boolean |
classInfoAvailable()
returns whether class information are attached to the input vectors does not check whether it is a valid file, only whether a String with length > 0 has been specified as path |
String |
getAttributeLabel(int dim)
returns the label (that is the name defined for an attribute in the template vector file) for the specified attribute. |
boolean |
getBoolDataProps(int type,
int attribute)
FIXME: split this into simple single getter methods... |
int[] |
getClassColorRGB(int c)
returns an array of length three containing the r,g,b values of the colour used to colour the specified class |
int |
getClassIndexOfInput(String inputLabel)
returns the index of the class the input vector specified by its index belongs to |
SOMLibClassInformation |
getClassInfo()
|
String |
getClassInformationFilename()
returns the path of the file containin the class information |
double[] |
getClassMeanVector(int classId)
returns the mean vector of all input items belonging to the given class |
Vector<String> |
getClusterName(ClusterNode node,
int clusterByValue,
int nodeDepth)
Tries to name a cluster by the input data mapped to units lying within the cluster For naming the cluster, some very simple heuristics are used: First, if there are any labels of the clusters, which correpsond to 0/1 attributes, and their values are all 0 (or 1) in the cluster, the name of this attribute is included to the name of the cluster. |
EditableReportProperties |
getEP()
Returns the Editable Report Properties for the Semantic Report |
InputData |
getInputData()
returns the InputData object storing information about the input data used for training the som. |
String |
getInputDataFilename()
returns the complete filename of the file containing the input data complete filename means including the path. |
InputDatum |
getInputDatum(int d)
returns the InputDatum at the specified index |
InputDatum |
getInputDatum(String name)
returns the InputDatum labelled with the specified name |
String[] |
getInputLabelsofClass(int classId)
returns a list of labels of all input items belonging to the given class |
String |
getNameOfClass(int c)
returns the name of the class specified by the index |
int |
getNumberOfClasses()
returns the number of classes. |
int |
getNumberOfClassmembers(int c)
returns the number of input elements belonging to the given class if no class information is attached to this input, -1 is returned |
int |
getNumberOfInputVectors()
returns the number of input vectors used for training the SOM, that is the number of different vectors present in the input file for the SOM training. |
int |
getNumberOfSelectedInputs()
returns the number of inputs the user has selected to get information about their position on the SOM |
int |
getNumberOfZeroValues(int index)
returns the number of input vectors that have 0 as value in the given dimension |
double |
getNumericalDataProps(int type,
int attribute)
FIXME: split this into simple single getter methods... |
double[][] |
getPCAdeterminedDims()
This method calculates the most important Dimensions of the Dataset according to the results of a PCA, and rows the resulting dim-index in a new array on first index. |
int |
getSelectedInputId(int index)
returns the id of the inputVector at position index in the list of selected inputs each input vector is identified by an id, which is its index in the complete input. |
String |
getTemplateFilename()
returns the complete filename of the file containing the template data complete filename means including the path. |
String[] |
getTrainingDataInfo()
Returns the names of the 3 files, used for training |
int |
getVectorDim()
returns the dimension of the input vectors, that is the same as the number of attributes used to describe the objects. |
boolean |
is01(int index)
returns whether the values in the given dimension are all only 0 or 1 |
boolean |
isDiscrete(int index)
returns whether our heuristic estimates this dimension to contain discrete values This is the case, if all values in this dimension are exact integer values. |
boolean |
isNormalized()
returns whether the input set has been normalized (in fact, this functions returns the result of InputData.isNormalizedToUnitLength()) |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int MIN_VALUE
public static final int MAX_VALUE
public static final int MEAN_VALUE
public static final int VAR_VALUE
public static final int ZERO_VALUE
public static final int ONLY01
public static final int DISCRETE
private Vector<Integer> selectedIndices
private InputData inputData
private String inputDataFilename
private String tvFilename
private TemplateVector inputTemplate
private SOMLibClassInformation classInfo
private String[] classNames
private String classInformationFilename
private EditableReportProperties EP
private boolean[] only01
private boolean[] discrete
private double[] min
private double[] max
private double[] mean
private double[] var
private int[] zeroValues
boolean denseData
Constructor Detail |
---|
public DatasetInformation(Vector<Integer> selectedIndices, String inputDataFilename, String tvFilename, String classInformationFile, EditableReportProperties EP)
selectedIndices
- Vector of indices of the input items selected for more informationinputDataFilename
- the path to the file containing the input datatvFilename
- the path to the file containin the template vectorclassInformationFile
- the path to the file containing the class informationEP
- the customized Report Features of the Semantic Reportpublic DatasetInformation(Vector<Integer> selectedIndices, String inputDataFilename, String tvFilename, String classInformationFile, EditableReportProperties EP, CommonSOMViewerStateData state)
Method Detail |
---|
public boolean classInfoAvailable()
public SOMLibClassInformation getClassInfo()
public int getNumberOfInputVectors()
public double[] getClassMeanVector(int classId)
classId
- the id of the class for which the mean vector shall be calculated
public int getVectorDim()
public boolean is01(int index)
index
- the dimension (starting with 0) for which this property is requested
public boolean isDiscrete(int index)
index
- the dimension (starting with 0) for which the estimation is requested
public int getNumberOfZeroValues(int index)
index
- the dimension (starting with 0) for which the number is requested
public boolean isNormalized()
public double getNumericalDataProps(int type, int attribute)
type
- specifies the type of information to be returned: allowed are some constants defined by this class
(see above)attribute
- the index of the attribute for which the value shall be returned (starting with 0)
public boolean getBoolDataProps(int type, int attribute)
type
- specifies the type of information to be returned: allowed are some constants defined by this class
(see above)attribute
- the index of the attribute for which the value shall be returned (starting with 0)
public String getAttributeLabel(int dim)
dim
- the index within the vector of the attribute whose label shall be returned
public int getNumberOfClasses()
public String getNameOfClass(int c)
c
- the index of the class (starting with 0)
public String[] getInputLabelsofClass(int classId)
classId
- the id of the class for which the input items are requested
public int[] getClassColorRGB(int c)
c
- the index of the class for which the colour is requested
public int getNumberOfClassmembers(int c)
c
- the index of the class (starting with 0)
public int getClassIndexOfInput(String inputLabel)
public String getClassInformationFilename()
private void checkDatatypes()
public InputData getInputData()
public InputDatum getInputDatum(String name)
public InputDatum getInputDatum(int d)
public int getNumberOfSelectedInputs()
public int getSelectedInputId(int index)
index
- the index of the vector in the list of selected inputs
public String getInputDataFilename()
public String getTemplateFilename()
public Vector<String> getClusterName(ClusterNode node, int clusterByValue, int nodeDepth)
node
- the node representing the cluster tha shall be namedclusterByValue
- indicates whether the labels for the cluster shall be created by value (is handed unchanged
to ClusterNode.getLabels(clusterByValue, boolen)nodeDepth
- the depth of the node in the tree, whereby the root (i.e. the cluster containing the whole map)
node has depth 1
public double[][] getPCAdeterminedDims()
public double calculateAccumulatedVariance()
public String[] getTrainingDataInfo()
private static String applyNameFix(String target)
public EditableReportProperties getEP()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |