This website requires JavaScript.

# On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

Mar 2023

We study linear contextual bandits in the misspecified setting, where theexpected reward function can be approximated by a linear function class up to abounded misspecification level $\zeta>0$. We propose an algorithm based on anovel data selection scheme, which only selects the contextual vectors withlarge uncertainty for online regression. We show that, when themisspecification level $\zeta$ is dominated by $\tilde O (\Delta / \sqrt{d})$with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimensionof the contextual vectors, our algorithm enjoys the same gap-dependent regretbound $\tilde O (d^2/\Delta)$ as in the well-specified setting up tologarithmic factors. In addition, we show that an existing algorithm SupLinUCB(Chu et al., 2011) can also achieve a gap-dependent constant regret boundwithout the knowledge of sub-optimality gap $\Delta$. Together with a lowerbound adapted from Lattimore et al. (2020), our result suggests an interplaybetween misspecification level and the sub-optimality gap: (1) the linearcontextual bandit model is efficiently learnable when $\zeta \leq \tildeO(\Delta / \sqrt{d})$; and (2) it is not efficiently learnable when $\zeta \geq\tilde \Omega({\Delta} / {\sqrt{d}})$. Experiments on both synthetic andreal-world datasets corroborate our theoretical results.

Q1论文试图解决什么问题？
Q2这是否是一个新的问题？
Q3这篇文章要验证一个什么科学假设？
0