HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

On the Difficulty of Evaluating Baselines: A Study on Recommender Systems

Steffen Rendle; Li Zhang; Yehuda Koren

On the Difficulty of Evaluating Baselines: A Study on Recommender Systems

Abstract

Numerical evaluations with comparisons to baselines play a central role when judging research in recommender systems. In this paper, we show that running baselines properly is difficult. We demonstrate this issue on two extensively studied datasets. First, we show that results for baselines that have been used in numerous publications over the past five years for the Movielens 10M benchmark are suboptimal. With a careful setup of a vanilla matrix factorization baseline, we are not only able to improve upon the reported results for this baseline but even outperform the reported results of any newly proposed method. Secondly, we recap the tremendous effort that was required by the community to obtain high quality results for simple methods on the Netflix Prize. Our results indicate that empirical findings in research papers are questionable unless they were obtained on standardized benchmarks where baselines have been tuned extensively by the research community.

Code Repositories

tohtsky/myFM
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
collaborative-filtering-on-movielens-10mSGD MF
RMSE: 0.772
collaborative-filtering-on-movielens-10mBayesian timeSVD++
RMSE: 0.7523
collaborative-filtering-on-movielens-10mBayesian SVD++
RMSE: 0.7563
collaborative-filtering-on-movielens-10mU-RBM
RMSE: 0.823
collaborative-filtering-on-movielens-10mBayesian timeSVD++ flipped
RMSE: 0.7485

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
On the Difficulty of Evaluating Baselines: A Study on Recommender Systems | Papers | HyperAI