Is Learning to Rank Worth it? A Statistical Analysis of Learning to Rank Methods in the LETOR Benchmarks

  • Guilherme C. M. Gomes Universidade Federal de Minas Gerais
  • Vitor C. Oliveira Universidade Federal de Minas Gerais
  • Jussara M. Almeida Universidade Federal de Minas Gerais
  • Marcos A. Gonçalves Universidade Federal de Minas Gerais
Keywords: Information Retrieval, Learning to Rank, Statistical Analysis


The Learning to Rank (L2R) research field has experienced a fast paced growth over the last few years, with a wide variety of benchmark datasets and baselines available for experimentation. We here investigate the main assumption behind this field, which is that, the use of sophisticated L2R algorithms and models, produce significant gains over more traditional and simple information retrieval approaches. Our experimental results in the LETOR benchmarks surprisingly indicate that many L2R algorithms, when put up against the best individual features of each dataset, may not produce statistically significant differences, even if the absolute gains may seem large. We also find that most of the reported baselines are statistically tied, with no clear winner.

Author Biography

Guilherme C. M. Gomes, Universidade Federal de Minas Gerais
Currenty an undergradiate student under prof. Marcos Gonçalves at the Database Laboratory (LBD) in UFMG's computer science department (DCC-UFMG),  researching machine learning algorithms applied to document ranking and information retrieval.