HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Efficient Counterfactual Learning from Bandit Feedback

Yusuke Narita; Shota Yasui; Kohei Yata

Efficient Counterfactual Learning from Bandit Feedback

Abstract

What is the most statistically efficient way to do off-policy evaluation and optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-the-art benchmark.

Benchmarks

BenchmarkMethodologyMetrics
causal-inference-on-idhp-
Average Treatment Effect Error: -0.225
visual-object-tracking-on-vot2014-
Expected Average Overlap (EAO): 1.047

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Efficient Counterfactual Learning from Bandit Feedback | Papers | HyperAI