Multi-challenge Data Set

Multi-challenge Data Set

The Multi-challenge data set is used to demonstrate how a data analysis method deals with clusters of different densities and shapes when these different characteristics are present in the same data set. This data set consists of several sub-data sets that are placed in a 10-dimensional space. The subsets themselves may live in spaces of lower dimensions. A PCA projection of this data set is shown below.



Each subset consists of the same number of sample points, and can be described (in the order of their numeric label) as follows:

The subsets are normalized individually to zero mean and unit variance. They are then arranged on a plane, which can be seen in Figure D.2.3. The distance between the data sets is 10 times their standard deviation.

Downloads: