This website requires JavaScript.

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Chlo\'e RouyerDirk van der HoevenNicol\`o Cesa-BianchiYevgeny Seldin
Jun 2022
摘要
We consider online learning with feedback graphs, a sequentialdecision-making framework where the learner's feedback is determined by adirected graph over the action set. We present a computationally efficientalgorithm for learning in this framework that simultaneously achievesnear-optimal regret bounds in both stochastic and adversarial environments. Thebound against oblivious adversaries is $\tilde{O} (\sqrt{\alpha T})$, where $T$is the time horizon and $\alpha$ is the independence number of the feedbackgraph. The bound against stochastic environments is $O\big( (\ln T)^2\max_{S\in \mathcal I(G)} \sum_{i \in S} \Delta_i^{-1}\big)$ where $\mathcalI(G)$ is the family of all independent sets in a suitably defined undirectedversion of the graph and $\Delta_i$ are the suboptimality gaps. The algorithmcombines ideas from the EXP3++ algorithm for stochastic and adversarial banditsand the EXP3.G algorithm for feedback graphs with a novel exploration scheme.The scheme, which exploits the structure of the graph to reduce exploration, iskey to obtain best-of-both-worlds guarantees with feedback graphs. We alsoextend our algorithm and results to a setting where the feedback graphs areallowed to change over time.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答