This website requires JavaScript.

Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions

Haanvid LeeJongmin LeeYunseon Choi ...+3 Kee-Eung Kim
Oct 2022
摘要
We consider local kernel metric learning for off-policy evaluation (OPE) ofdeterministic policies in contextual bandits with continuous action spaces. Ourwork is motivated by practical scenarios where the target policy needs to bedeterministic due to domain requirements, such as prescription of treatmentdosage and duration in medicine. Although importance sampling (IS) provides abasic principle for OPE, it is ill-posed for the deterministic target policywith continuous actions. Our main idea is to relax the target policy and posethe problem as kernel-based estimation, where we learn the kernel metric inorder to minimize the overall mean squared error (MSE). We present an analyticsolution for the optimal metric, based on the analysis of bias and variance.Whereas prior work has been limited to scalar action spaces or kernel bandwidthselection, our work takes a step further being capable of vector action spacesand metric optimization. We show that our estimator is consistent, andsignificantly reduces the MSE compared to baseline OPE methods throughexperiments on various domains.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答