The main idea in proving regret bound for the Follow the Perturbed Leader (FPL) algorithm is as follows: first, we define the hypothetical leader which knows the future cost vector before prediction on round which attains zero regret. Then we can prove that by adding some perturbation to the current cumulative cost vectors till time the Player gets regret bound. Now the question is that if we have some partial information about future cost or we are allowed to query some information about the next round's cost, can we design algorithms with improved regret bounds. This problem is tackled down in two recent papers. In Online Learning with Prior Knowledge the authors proposed an algorithm which utilize such information as a kind of prior information. Recently in Online Learning with Queries a variant of Weighted Majority Algorithm with capability of querying limited information about the future cost vector is proposed.
No comments:
Post a Comment