Clustering based ensemble classification for spam filtering

R. Neumayer:
"Clustering based ensemble classification for spam filtering";
in:"Proceedings of the 6th Workshop on Data Analysis", Elfa Academic Press, 2006, S. 11 - 22.

[ Publication Database ]


Abstract. Spamfiltering has become a very important issue throughout the last years as unsolicited bulk e-mail imposes large problems in terms of both the amount of time spent on and the resources needed to automaticallyfilter those messages. Text information retrieval offers the tools and algorithms to handle text documents in their abstract vector form. Thereon, machine learning algorithms can be applied. This work deals with the possible improvements gained from ensembles, i.e. multiple, differing classifiers for the same task. Those individual classifiers canfit parts of the training data better and therefore may improve classification results, when the bestfitting classifier can be found. Basic classification algorithms as well as clustering are introduced. Furthermore the application of the ensemble idea is explained and experimental results are presented.