at.tuwien.ifs.somtoolbox.data
Class SOMLibClassInformation

java.lang.Object
  extended by at.tuwien.ifs.somtoolbox.data.SOMLibClassInformation
Direct Known Subclasses:
ESOMClassInformation

public class SOMLibClassInformation
extends Object

This class provides information about class labels for the InputData input vectors.

The file format consists of a header and the content as follows:

$TYPE string, mandatory. Fixed to class_information.
$NUM_CLASSES integer, mandatory: gives the number of classes.
$CLASS_NAMES mandatory: a space-separated list of class names; the count has to be the same as in $NUM_CLASSES.
$XDIM integer, mandatory: number of units in x-direction. Fixed to 2.
$YDIM integer, mandatory: dimensionality class information vector, equals the number of input vectors ( InputData.numVectors()).
labelName_n classIndex_n the $YDIM number of mappings from the input vector label name to the class label index [0...($NUM_CLASSES-1)].

See also an example file from the Iris data set.

Alternatively, the file format can be more simple, and not contain any file header. Then, there is only a list of lines with two tabulator-seperated Strings in the form of labelName className.
The number of classes, and the indices of those classes, are computer automatically.

Finally, the simplest form of the file is to have lines with just the class label; then, this class is assigned to the input datum with the index of the line number.
The number of classes, and the indices of those classes, are computer automatically.

Version:
$Id: SOMLibClassInformation.java 3888 2010-11-02 17:42:53Z frank $
Author:
Michael Dittenbach, Thomas Lidy, Rudolf Mayer

Field Summary
protected  String classInformationFileName
          The file name to read from.
private  int[] classMemberCount
          The number of inputs in each class.
private  String[] classNames
          The names of the classes.
private  ArrayList<String> classNamesTemp
           
private  int[] dataClasses
          A mapping input index => class index, for fast lookup.
private  HashMap<String,Comparable> dataHash
          Mapping class name => class index, for fast lookup.
private  String[] dataNames
           
private  ArrayList<String> dataNamesTemp
           
private static Logger logger
           
private  int numClasses
          The number of classes.
protected  int numData
          The number of input vectors.
private  org.jfree.util.PaintList paintList
           
 
Constructor Summary
SOMLibClassInformation()
          Constructor intended to be used e.g.
SOMLibClassInformation(Map<String,String> classAssignment)
           
SOMLibClassInformation(String classInformationFileName)
          Creates a new class information object by trying to read the given file in both the versions with a file header ( readSOMLibClassInformationFile()) and the tab-separated file (readTabSepClassInformationFile() ).
SOMLibClassInformation(String[] classNames, String[][] dataName)
          Constructor intended to be used when generating data.
 
Method Summary
 void addItem(String label, String classname)
           
 String[] classNames()
          Returns all the distinct class names.
 int[] computeClassDistribution(String[] labelNames)
          computes the percentages of class membership for the given label names
 Color getClassColor(int index)
          Get the colour for the given class index.
 Color[] getClassColors()
          Get all class colours.
 int getClassIndex(String className)
          Gets the index number for a given class label.
 int getClassIndexForInput(String vectorName)
           
 String getClassName(int index)
          Gets the class label name for a given input vector index.
 String getClassName(String vectorName)
          Gets the class name for a vector name.
 String[] getClassNames()
          Returns the names of the classes.
 String[] getDataNames()
           
 String[] getDataNamesInClass(String className)
           
 String[][] getDataNamesPerClass()
          Returns an array of data names for each class.
 int getNumberOfClassMembers(int classIndex)
          Gets the number of input vectors in the given class.
 org.jfree.util.PaintList getPaintList()
          Get the class colours as PaintList.
 double getPercentageOfClassMembers(int classIndex)
           
 boolean hasClassAssignmentForName(String vectorName)
           
private  void initPaintList()
          Initialise a standard paint list
 boolean loadClassColours(File file)
          Load colours from an external (non-classinfo) file.
static void main(String[] args)
          Method for stand-alone execution to convert a file to the SOMLibClassInformation format.
 int numClasses()
          Gets the number of classes, as read from $NUM_CLASSES, or computed.
 void processItems(boolean sort)
           
private  void readSimple()
           
protected  void readSOMLibClassInformationFile()
          Reads a class information file containing a header and class indices.
private  void readTabSepClassInformationFile()
          Reads a class information file containing no header, and tab-separated String entries for the input vector and class labels.
 void removeNotPresentElements(SOMLibSparseInputData inputData)
           
 void setClassColor(int index, Color color)
          Get the colour for the given class index.
private  void throwClassInfoReadingError(String classInformationFileName, IOException e)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

private static final Logger logger

classInformationFileName

protected String classInformationFileName
The file name to read from.


numClasses

private int numClasses
The number of classes. Either read from the file header, or computed from the distinct number of class names in the tab-seperated file.


classNames

private String[] classNames
The names of the classes. Either read from the file header, or computed from the distinct class names in the tab-seperated file.


classMemberCount

private int[] classMemberCount
The number of inputs in each class.


numData

protected int numData
The number of input vectors. Either read from the file header, or computed from the number of data lines in the tab-seperated file.


dataNames

private String[] dataNames

dataClasses

private int[] dataClasses
A mapping input index => class index, for fast lookup.


dataHash

private HashMap<String,Comparable> dataHash
Mapping class name => class index, for fast lookup.


classNamesTemp

private ArrayList<String> classNamesTemp

dataNamesTemp

private ArrayList<String> dataNamesTemp

paintList

private org.jfree.util.PaintList paintList
Constructor Detail

SOMLibClassInformation

public SOMLibClassInformation()
Constructor intended to be used e.g. when generating data, or when reading a file with the SOMPAKInputData


SOMLibClassInformation

public SOMLibClassInformation(Map<String,String> classAssignment)

SOMLibClassInformation

public SOMLibClassInformation(String[] classNames,
                              String[][] dataName)
Constructor intended to be used when generating data.


SOMLibClassInformation

public SOMLibClassInformation(String classInformationFileName)
                       throws SOMToolboxException
Creates a new class information object by trying to read the given file in both the versions with a file header ( readSOMLibClassInformationFile()) and the tab-separated file (readTabSepClassInformationFile() ).

Parameters:
classInformationFileName - The file to read from
Throws:
SOMToolboxException - if there is any error in the file format
Method Detail

getClassNames

public String[] getClassNames()
Returns the names of the classes.


getDataNamesPerClass

public String[][] getDataNamesPerClass()
Returns an array of data names for each class.


getDataNames

public String[] getDataNames()

throwClassInfoReadingError

private void throwClassInfoReadingError(String classInformationFileName,
                                        IOException e)
                                 throws SOMLibFileFormatException
Throws:
SOMLibFileFormatException

readTabSepClassInformationFile

private void readTabSepClassInformationFile()
                                     throws SOMToolboxException,
                                            IOException
Reads a class information file containing no header, and tab-separated String entries for the input vector and class labels.

Throws:
SOMToolboxException - if there is any error in the file format
IOException

readSimple

private void readSimple()
                 throws SOMToolboxException,
                        IOException
Throws:
SOMToolboxException
IOException

processItems

public void processItems(boolean sort)

addItem

public void addItem(String label,
                    String classname)

readSOMLibClassInformationFile

protected void readSOMLibClassInformationFile()
                                       throws IOException,
                                              SOMToolboxException
Reads a class information file containing a header and class indices.

Throws:
IOException
SOMToolboxException

numClasses

public int numClasses()
Gets the number of classes, as read from $NUM_CLASSES, or computed.

Returns:
the number of classes.

classNames

public String[] classNames()
Returns all the distinct class names.

Returns:
the class names.

getClassIndex

public int getClassIndex(String className)
Gets the index number for a given class label.

Parameters:
className - the class label.
Returns:
the index of that label.

getClassName

public String getClassName(int index)
Gets the class label name for a given input vector index.

Parameters:
index - index of the input vector.
Returns:
the name of the class.

getClassName

public String getClassName(String vectorName)
                    throws SOMLibFileFormatException
Gets the class name for a vector name.

Parameters:
vectorName - the name of the input vector.
Returns:
the name of the class.
Throws:
SOMLibFileFormatException - If there is no class information available for the given vector name/label

hasClassAssignmentForName

public boolean hasClassAssignmentForName(String vectorName)

getClassIndexForInput

public int getClassIndexForInput(String vectorName)
                          throws SOMLibFileFormatException
Throws:
SOMLibFileFormatException

getNumberOfClassMembers

public int getNumberOfClassMembers(int classIndex)
Gets the number of input vectors in the given class.

Parameters:
classIndex - the index of the class.
Returns:
the total number of inputs in that class.

getPercentageOfClassMembers

public double getPercentageOfClassMembers(int classIndex)

getDataNamesInClass

public String[] getDataNamesInClass(String className)

computeClassDistribution

public int[] computeClassDistribution(String[] labelNames)
computes the percentages of class membership for the given label names


initPaintList

private void initPaintList()
Initialise a standard paint list


getPaintList

public org.jfree.util.PaintList getPaintList()
Get the class colours as PaintList.


getClassColors

public Color[] getClassColors()
Get all class colours.


getClassColor

public Color getClassColor(int index)
Get the colour for the given class index.


setClassColor

public void setClassColor(int index,
                          Color color)
Get the colour for the given class index.


loadClassColours

public boolean loadClassColours(File file)
Load colours from an external (non-classinfo) file.


removeNotPresentElements

public void removeNotPresentElements(SOMLibSparseInputData inputData)

main

public static void main(String[] args)
                 throws SOMToolboxException,
                        IOException
Method for stand-alone execution to convert a file to the SOMLibClassInformation format.

Throws:
SOMToolboxException
IOException