at.tuwien.ifs.somtoolbox.visualization.clustering
Class Cluster

java.lang.Object
  extended by at.tuwien.ifs.somtoolbox.visualization.clustering.Cluster

public class Cluster
extends Object

A Cluster used in KMeans clustering. Has a centroid and a number of indices of a data set assigned to it.

Version:
$Id: Cluster.java 3583 2010-05-21 10:07:41Z mayer $
Author:
Robert Neumayer
See Also:
KMeans

Field Summary
private  double[] centroid
           
private  DistanceMetric distanceFunction
           
private  Vector<Integer> indices
           
private static int MAX_DIM_DEBUG
           
private static int MAX_INDICES_DEBUG
           
 
Constructor Summary
Cluster()
           
Cluster(DistanceMetric distanceFunction)
           
Cluster(double[] centroid)
           
Cluster(double[] centroid, DistanceMetric distanceFunction)
           
 
Method Summary
 void addIndex(int index)
          Add the index of a data point to this cluster.
 double averageSSE(double[][] data)
          SSE again, this time the average one (i.e.
 void calculateCentroid(double[][] data)
          Calculate the centroid of this cluster.
 double[] getCentroid()
           
 double getDistanceToCentroid(double[] instance)
          Get the distance of a given instance to this cluster's centroid.
 Vector<Integer> getIndices()
           
 int getInstanceIndexWithMaxSSE(double[][] data)
          Get the instance with the maximum SSE of all instances assigned to this cluster.
 double[][] getInstances(double[][] data)
          Returns all the instances belonging to this cluster according to the given data set.
 int[] getNumberOfAttributeOccurrences(double[][] data)
          Get the numbers of occurrences of each attribute in this cluster.
 int getNumberOfInstances()
           
 void printClusterIndices()
          Tough one to guess.
 void printClusterIndices(double[][] data)
          Tough one to guess.
 void removeInstanceIndex(int instanceIndex)
          Removes the instance according to the given index.
 void setCentroid(double[] centroid)
          Set the centroid of this cluster.
 double SSE(double[][] data)
          Calculate the sum of the squared error (SSE) for this cluster.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAX_DIM_DEBUG

private static final int MAX_DIM_DEBUG
See Also:
Constant Field Values

MAX_INDICES_DEBUG

private static final int MAX_INDICES_DEBUG
See Also:
Constant Field Values

indices

private Vector<Integer> indices

centroid

private double[] centroid

distanceFunction

private DistanceMetric distanceFunction
Constructor Detail

Cluster

public Cluster()

Cluster

public Cluster(double[] centroid)

Cluster

public Cluster(double[] centroid,
               DistanceMetric distanceFunction)

Cluster

public Cluster(DistanceMetric distanceFunction)
Method Detail

calculateCentroid

public void calculateCentroid(double[][] data)
Calculate the centroid of this cluster. This is done by summing up all individual values divided by the number of instances assigned to it.

Parameters:
data - the data set.

removeInstanceIndex

public void removeInstanceIndex(int instanceIndex)
Removes the instance according to the given index.


addIndex

public void addIndex(int index)
Add the index of a data point to this cluster.

Parameters:
index - to add.

setCentroid

public void setCentroid(double[] centroid)
Set the centroid of this cluster.

Parameters:
centroid - to set.

getCentroid

public double[] getCentroid()

getIndices

public Vector<Integer> getIndices()

getNumberOfInstances

public int getNumberOfInstances()

printClusterIndices

public void printClusterIndices(double[][] data)
Tough one to guess.


printClusterIndices

public void printClusterIndices()
Tough one to guess.


getInstances

public double[][] getInstances(double[][] data)
Returns all the instances belonging to this cluster according to the given data set.

Parameters:
data - instances.
Returns:
plain matrix of all assigned instances.

SSE

public double SSE(double[][] data)
Calculate the sum of the squared error (SSE) for this cluster. This is the distances of the cluster's centroid to all units assigned.

Parameters:
data - matrix to compute the SSE for.
Returns:
the SSE value for this cluster.

averageSSE

public double averageSSE(double[][] data)
SSE again, this time the average one (i.e. divided by the number of instances within this cluster)


getDistanceToCentroid

public double getDistanceToCentroid(double[] instance)
Get the distance of a given instance to this cluster's centroid.

Parameters:
instance - some instance.
Returns:
the distance according to the used distance function.

getNumberOfAttributeOccurrences

public int[] getNumberOfAttributeOccurrences(double[][] data)
Get the numbers of occurrences of each attribute in this cluster.

Returns:
array for each attribute and the number of how many instances it occurs in

getInstanceIndexWithMaxSSE

public int getInstanceIndexWithMaxSSE(double[][] data)
Get the instance with the maximum SSE of all instances assigned to this cluster.