|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectat.tuwien.ifs.somtoolbox.visualization.clustering.KMeans
public class KMeans
Pretty much the classic K-Means clustering. Tried to keep it simple, though.
Nested Class Summary | |
---|---|
static class |
KMeans.InitType
|
Field Summary | |
---|---|
protected Cluster[] |
clusters
|
protected double[][] |
data
|
private double[] |
differences
|
private Hashtable<Integer,Integer> |
instancesInClusters
|
private int |
k
|
private int |
lastNumberOfUpdates
|
private double[] |
maxValues
|
private double[] |
minValues
|
private static int |
NUMBER_OF_UPDATE_RANGE
|
private int |
numberOfAttributes
|
private int |
numberOfInstances
|
private static long |
RANDOM_SEED
|
Constructor Summary | |
---|---|
KMeans(int k,
double[][] data)
Default constructor (as much defaulting as possible). |
|
KMeans(int k,
double[][] data,
KMeans.InitType initialisation)
Instantiate a new KMeans object with: |
|
KMeans(int k,
double[][] data,
KMeans.InitType initialisation,
DistanceMetric distanceFunction)
Construct a new K-Means bugger. |
Method Summary | |
---|---|
private void |
calculateNewCentroids()
Batch calculation of all cluster centroids. |
double[][] |
getClusterCentroids()
Get a double[][] of all cluster centroids. |
Cluster[] |
getClusters()
|
double[][] |
getClusterVariances()
|
double[][] |
getData()
|
double[] |
getDifferences()
|
private int |
getIndexOfClosestCluster(double[] instance)
Get the index of the closest cluster for the given instance index. |
double[] |
getMaxValues()
|
double[][] |
getMinMaxNormalisedClusterCentroids()
Get a double[][] of all cluster centroids. |
double[][] |
getMinMaxNormalisedClusterCentroidsWithin()
Get a double[][] of all cluster centroids. |
double[] |
getMinValues()
|
int[][] |
getOccurrenceLabels(int numberOfLabels)
Get a set of labels for the given clustering based on the occurrences of attributes within clusters, i.e. |
double |
getSSE()
Get the sum of the squared error for all clusters. |
double[] |
getSSEs()
Get the sum of the squared error for single clusters. |
private double[] |
getSubstituteCentroid()
Get a new centroid for empty clusters. |
private void |
initClustersEqualNumbers(DistanceMetric distanceFunction)
cluster centres are initialised by equally sized random chunks of the input data when there's 150 instances, we assign 50 chosen randomly to each cluster and calculate its centre from these (the last cluster might be larger if numInstances mod k < 0) |
private void |
initClustersLinearly(DistanceMetric distanceFunction)
This one does linear initialisation. |
private void |
initClustersLinearlyOnInstances(DistanceMetric distanceFunction)
like initClustersLinearly(DistanceMetric) , but after computing the exact linear point, rather finds &
uses the closest instance from the data set as centroid. |
private void |
initClustersRandomly(DistanceMetric distanceFunction)
Calculate random centroids for each cluster. |
private void |
initClustersRandomlyOnInstances(DistanceMetric distanceFunction)
Take random points from the input data as centroids. |
private void |
initMinAndMaxValues()
Utility method to get the min, max, and diff values of the data set. |
void |
printCentroids()
|
void |
printCentroidsShort()
|
void |
printClusterIndices()
|
private void |
removeEmptyClusters()
Searches for clusters which have no instances assigned. |
void |
setClusterCentroids(double[][] centroids)
Initialise the cluster centres with the given centres. |
void |
train()
Train for as long as instances move between clusters. |
void |
train(int numberOfSteps)
Train for a certain number of steps. |
private boolean |
trainingStep()
A classic training step in the K-Means world. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected double[][] data
private int k
private int numberOfInstances
private int numberOfAttributes
private double[] minValues
private double[] maxValues
private double[] differences
private Hashtable<Integer,Integer> instancesInClusters
protected Cluster[] clusters
private static long RANDOM_SEED
private int lastNumberOfUpdates
private static final int NUMBER_OF_UPDATE_RANGE
Constructor Detail |
---|
public KMeans(int k, double[][] data)
k
- number of clustersdata
- guesspublic KMeans(int k, double[][] data, KMeans.InitType initialisation)
k
- number of clustersdata
- the data to clusterinitialisation
- the initialisation method used (to be chosen from InitType)public KMeans(int k, double[][] data, KMeans.InitType initialisation, DistanceMetric distanceFunction)
k
- number of clustersdata
- the data setinitialisation
- initialisation typedistanceFunction
- an LnMetric of your choiceMethod Detail |
---|
public void train(int numberOfSteps)
numberOfSteps
- how many would you like?public void train()
NUMBER_OF_UPDATE_RANGE
steps (5).
private void removeEmptyClusters()
private boolean trainingStep()
private void calculateNewCentroids()
private double[] getSubstituteCentroid()
private int getIndexOfClosestCluster(double[] instance)
instance
- the data vector to be assigned
public int[][] getOccurrenceLabels(int numberOfLabels)
private void initClustersRandomly(DistanceMetric distanceFunction)
private void initClustersEqualNumbers(DistanceMetric distanceFunction)
private void initClustersRandomlyOnInstances(DistanceMetric distanceFunction)
private void initClustersLinearly(DistanceMetric distanceFunction)
private void initClustersLinearlyOnInstances(DistanceMetric distanceFunction)
initClustersLinearly(DistanceMetric)
, but after computing the exact linear point, rather finds &
uses the closest instance from the data set as centroid.
public void setClusterCentroids(double[][] centroids) throws MoreCentresThanKException
centroids
- centroids for clusters.
MoreCentresThanKException
- don't dare to set more or less centres than our k value.private void initMinAndMaxValues()
public double[][] getClusterCentroids()
public double[][] getClusterVariances()
public double[][] getMinMaxNormalisedClusterCentroids()
public double[][] getMinMaxNormalisedClusterCentroidsWithin()
public double[] getMinValues()
public double[] getMaxValues()
public double[] getDifferences()
public Cluster[] getClusters()
public double getSSE()
public double[] getSSEs()
public void printCentroids()
public void printCentroidsShort()
public void printClusterIndices()
public double[][] getData()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |