Verboseness Fission for BM25 Document Length Normalization

A. Lipani, M. Lupu, A. Hanbury, A. Aizawa:
"Verboseness Fission for BM25 Document Length Normalization";
in:"ICTIR '15 Proceedings of the 2015 International Conference on The Theory of Information Retrieval", ACM, 2015, ISBN: 978-1-4503-3833-2, S. 385 - 388.

[ Publication Database ]

Abstract:


BM25 is probably the most well known term weighting model in Information Retrieval. It has, depending on the formula variant at hand, 2 or 3 parameters (k1, b, and k3). This paper addresses b - the document length normalization parameter. Based on the observation that the two cases previously discussed for length normalization (multi-topicality and verboseness) are actually three: multi-topicality, verboseness with word repetition (repetitiveness) and verboseness with synonyms, we propose and test a new length normalization method that removes the need for a b parameter in BM25. Testing the new method on a set of purposefully varied test collections, we observe that we can obtain results statistically indistinguishable from the optimal results, therefore removing the need for ground-truth based optimization.