WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data

Authors

  • Otávio D. A. Alcântara Federal University of Minas Gerais
  • Alvaro R. Pereira Jr. Federal University of Minas Gerais
  • Humberto M. Almeida Universidade Federal de Minas Gerais
  • Marcos A. Gonçalves Universidade Federal de Minas Gerais
  • Christian Middleton Universitat Pompeu Fabra
  • Ricardo Baeza-Yates Dept. of Computer Science Universidad de Chile

Keywords:

Benchmark, Clickthrough, Learning to Rank

Abstract

In this paper we present WCL2R, a benchmark collection  for supporting
research in learning to rank (L2R) algorithms which exploit clickthrough
features.  Differently from other L2R benchmark collections, such as LETOR
and the recently released Yahoo!'s collection for a L2R competition, in
WCL2R we focus on defining a significant (and new) set of features over
clickthrough data extracted from the logs of a real-world search engine.
In this paper, we describe the WCL2R collection by providing details about
how the corpora, queries and relevance judgments were obtained, how the
learning features were constructed  and how the process of splitting the
collection in folds for representative learning was performed. We also analyze the
discriminative power of the WCL2R collection using traditional feature
selection algorithms and show that the most discriminative features are, in fact, those
based on clickthrough data. We then compare several L2R algorithms on
WCL2R, showing that all of them obtain significant gains by exploiting
clickthrough information over using traditional ranking approaches.

Downloads

Download data is not yet available.

Downloads

Additional Files

Published

2010-09-14

Issue

Section

Regular Articles