Recently, we tried to find regret bounds by variation for general Online Convex Optimization problem. If one goes through the Chapter 2 of Prediction, Learning, and Games, he/she can easily find three different ways to express the regret bound: in terms of the number of rounds T, loss of the best expert, and variation of cost vectors. In Extracting certainty from uncertainty: regret bounded by variation in costs , the authors extend the result to online linear optimization and showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In our work, we extended this result to general OCO. We first analyze the limitations of the algorithm in Extracting certainty from uncertainty: regret bounded by variation in costs when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variation-based regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials.

## No comments:

## Post a Comment