TU Logo   IFS Logo Vienna University of Technology
Institute of Software Technology and Interactive Systems
Data Mining with the Java SOMToolbox
[DataMining Home] [People] [Publications] [SOMToolbox]

SOMLib Data Files

Ver. 1.7 - 17.08.2010 (History)

General Information

There are two different types of data files: files describing the input data used for training, and files describing the trained SOMs. All of these files are built around the same basic structure which is defined as follows: In the following sections the 6 files are described in more detail, giving an idea of the contents and the intention of the file as well as its very structure in terms of the order of parameters as well as the distinction between mandatory (M) and optional (O) parameters. Furthermore, the relationships between the parameters are listed.

Input data files

SOMLib Input Vector File

Standard filename: XXX.vec or XXX.in
Produced by: Parser, Vector Generator
Modified by: -
Demo-File: demo.tfxidf

This file describes the input vectors to be used for the training process of a Self-Organizing Map. It is written by the parser or vector generator program creating the vector structure
The files consists of two blocks, the first one describing the input vectors in order to follow the general file structure of weight vector files, the second giving the input vectors
The file structure is identical to the SOMLib Weight Vector File. However, some semantical changes of the first 4 vector entries are as follows

Parameter Entries:

The remainder of the file is identical to the SOMLib Weigth Vector File:

^top

SOMLib Template Vector File

Standard filename: XXX.tv
Produced by: Parser, Vector Generator
Modified by: -
Demo-File: demo.tv

This file describes the template vectors providing the attribute structure of the input vectors used for the training process of a Self-Organizing Map. It is written by the parser or vector generator program creating the vector structure.

Parameter Entries:

The remainder of this files lists the attributes of the vectors by 7 columns of information as follows

^top

SOMLib Vector Description File

Standard filename: XXX.vec
Produced by: Parser or vector generator program
Modified by: SOM browsing software

This file describes the input vectors for a self-organizing map. It is written by the parser or vector generator program and describes the properties of each vector
The file consists of one set of attributes per vector with the very attributes still being subject to modification, or rather, extension. The structure of the description of the vectors follows in general the structure of the unit description file. Further attributes will be added as the necessity arises, especially in the context of metaphor graphics. Furthermore, the question whether each of the description files should be kept as an independet file or be part of one lare file comrising the whole collection has not been fully decided upon.
The attributes considered so far are:

Parameter Entries:

The header above describes the general file structure.
following this block, the second block contains the following set of attributes per vector/file:

^top

Class Information File

Standard filename: XXX.cls
Produced by: Parser or vector generator program
Modified by: SOM browsing software

This file provides a mapping from the input vectors to a class assignment. This class assignment can be used by the SOMViewer to display the class distribution over the map, using pie-charts or the class map visualisation. It is written by the parser or vector generator, such as the Audio Feature Extraction, or generated manually by the user.
The file simply contains for each input vector an assignment to a certain class. Two different formats exist:

SOMLib Class Information File

Parameter Entries:

The header above describes the general file structure.
Following this block, the second block contains a space-separated mapping vector label => class index


Tab-separated Class Information File

This file-format doesn't use any header, just provides a tab-separated mapping vector label => class name

^top

SOM files

SOMLib Unit Description File

Standard filename: XXX.unit
Produced by: SOM training program
Modified by: SOM mapping program, LabelSOM program
Demo-File: demo.unit

This file describes the units of the trained Self-Organizing Map. It is written by the SOM training program.
The files consists of two blocks, the first one describing the general SOM structure, the second giving a specific description of every unit
The first 3 parameter entries are given as a sanity check to find out whether the given SOM map file and weight vector file match. If any of the 3 first parameters does not match the program should print a detailed error message and exit.

Parameter Entries:

This header describes the general SOM structure.
Following this block, the second block contains the following set of attributes per unit:

^top

SOMLib Weight Vector File

Standard filename: XXX.wgt
Produced by: SOM init program, SOM training program
Modified by: -
Demo-File: demo.wgt

This file describes the weight vectors of the trained Self-Organizing Map. It is initially written as result of the SOM init program, read by the SOM training program as initialized map and finally written by the SOM training program after training
The files consists of two blocks, the first one describing the general SOM structure, the second giving the weight vectors of the SOM
The first 4 parameter entries are given as a sanity check to find out whether the given SOM map file and weight vector file match. If any of the 4 first parameters does not match the program should print a detailed error message and exit.

Parameter Entries:

^top

SOMLib Map Description File

Standard filename: XXX.map
Produced by: SOM training program
Modified by: SOM mapping program, SOM quant-error program
Demo-File: demo.mapdescr

This file describes the basic structure of the Self-Organizing Map, giving all the parameters used in the training process. It is initially written as result of the training process of the SOM. Additional Information attributes may be added as required by various programs.

Parameter Entries:

^top

SOMLib Quantization Error Map File

Standard filename: XXX.err
Produced by: SOM quantization error program
Modified by: -
Demo-File: none

This file describes the quantization error vectors of the trained Self-Organizing Map. It is written by the SOM quantization error program based on a trained map and given input vectors
The files consists of two blocks, the first one describing the general SOM structure, the second giving the quantization error vectors of the SOM.
The file structure is identical to the general weight vector description file. The first 4 parameter entries are given as a sanity check to find out whether the given SOM map file and weight vector file match. If any of the 4 first parameters does not match the program should print a detailed error message and exit.

Parameter Entries:

^top

SOMLib Data Winner Mapping File

Standard filename: XXX.dwm
Produced by: SOM training program
Modified by: -
Demo-File: demo.wgt

This file provides information about the best-matching units for all input vectors. It is used primarily by the Smoothed Data Histograms visualisation.
The files consists of two blocks, the first one describing the general SOM structure, the second giving the winners for each input vector.
The first 4 parameter entries are given as a sanity check to find out whether the given SOM map file and weight vector file match. If any of the 4 first parameters does not match the program should print a detailed error message and exit.

Parameter Entries:

The header above describes the general file structure. Following this block, the second block contains the winners for each input datum

^top

History