Online and Game Theoretic Learning

Tuesday, January 8, 2013

Unified Algorithms for Stochastic and Adversarial Online Learning

Algorithms for online decision making independently have been developed in three different settings as:

Stochastic Rewards
Adversarial Rewards
Variation based Regret Bound

Usually, algorithms such as UCB and Hedge attain $\log T$ and $O(\sqrt{T})$ regret bound in stochastic and adversarial settings, respectively, which are known to be optimal and tight. But, unfortunately all these algorithms are developed for that specific setting and can not be applied to other setting. For example, when UCB is provided with adversarial rewards or Hedge with stochastic rewards, they can not obtain the optimal bound known for that setting.

Algorithms with variation based bound tries to bridge the gap from adversarial setting to stochastic setting by developing algorithms with regret bound bounded by the variation of cost/reward vectors (check Hazan and Kale's work at COLT 2008 and our work at COLT 2012). So, it has been an unresolved question that is there any algorithm that works well for both settings. Such an algorithm, needs to be adaptive to the uncertainty in the sequence of rewards and is expected to attain $\log T$ regret bound when encountered with stochastic rewards and $O(\sqrt{T})$ regret bound for adversarial rewards.

Recently, the FlipFlop algorithm which is a combination of Follow-the-Leader (FTL) and AdaptiveHedge algorithms made a progress towards solving this issue but still there is a room to develop better algorithm. The FlipFlop algorithm retains $O(\sqrt{T})$ regret bound in worst case, but has better guarantees for stochastic setting and it has been shown that its performance should never be substantially worse than that of FTL algorithm.

Challenging Problem: developing efficient unified algorithm that attains $O(\log T)$ and $O(\sqrt{T})$ regret bounds in stochastic and adversarial settings, respectively. The proposed algorithm is not necessarily a combination of algorithms from both settings and would be a novel algorithm with guaranteed bounds as mentioned.

Saturday, November 10, 2012

NIPS 2012 Papers on Online Learning and Convex Optimization

Here is the list of interesting papers accepted in NIPS 2012 related to Online Learning and Convex Optimization:

A. Online Learning

- No-Regret Algorithms for Unconstrained Online Convex Optimization

- Mirror Descent Meets Fixed Share (and feels no regret)

- Relax and Randomize : From Value to Algorithms

- Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

- Confusion-Based Online Learning and a Passive-Aggressive Scheme

- Risk-Aversion in Multi-armed Bandits

- Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

B. Convex Optimization

- Optimal Regularized Dual Averaging Methods for Stochastic Optimization

- Proximal Newton-type Methods for Minimizing Convex Objective Functions in Composite Form

- Query Complexity of Derivative-Free Optimization

- Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

- A quasi-Newton Proximal Splitting Method

- Stochastic Gradient Descent with Only One Projection

- A Stochastic Gradient Method with an Exponential Convergence  Rate with Finite Training Sets

- Communication/Computation Tradeoffs in Consensus-Based Distributed Optimization

- Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

- Feature Clustering for Accelerating Parallel Coordinate Descent

Tuesday, October 23, 2012

Online Convex Optmization with Switching Costs

Recently in COLT 2012, an excellent paper entitled Unified Algorithms for Online Learning and Competitive Analysis makes a progress towards unifying the algorithms from two different communities of online leaning and competitive analysis. There is a few crucial differences in assumptions and goals of these two frameworks for online decision making which is the reason why there has not been much research on connecting the two. First, in competitive analysis (a.k.a online algorithms) there is a state concept which is usually modeled as switching or movement cost. Second, in online algorithms one assumes that the cost vector for next round is known to the algorithm to make the decision; in contrast, in online learning, the decision maker first makes the decision for next round and after that observes the cost vector. One interesting questions in connecting these two setting would be adding the switching cost to the regret incurred by the online learner. More specifically, one may consider the $\|\mathbf{x}_t - \mathbf{x}_{t-1}\|$ as the movement cost at round t. Obviously, the decision maker needs to consider the switching cost in his decision to minimize it. Having this formulation, the learner competes with the best offline decision maker with bounded switching cost. When the bound on switching cost is in terms of number of rounds T, then we recover the standard setting of online learning. I will post more about this problem and some solutions soon, hopefully :-)

Stochastic Gradient Descent with only ONE Projection

With the increasing amount of data that is available for training, it becomes an urgent task to devise efficient algorithms for optimization/learning problems with unprecedented sizes. Online learning algorithms, such as celebrated Stochastic Gradient Descent (SGD) and its counterpart Online Gradient Descent (OGD), despite of their slow rate of convergence compared with the batch methods, have shown to be very effective for large scale and online learning problems, both theoretically and empirically. Although a large number of iterations is usually needed to obtain a solution of desirable accuracy, the lightweight computation per iteration makes SGD attractive for many large-scale learning problems. To find a solution within the domain $\K$ that optimizes the given objective function $f(\x)$, SGD computes an unbiased estimate of the gradient of $f(\x)$, and updates the solution %for complex domains, such as a positive semidefinite (PSD) cone in metric learning and bounded trace norm matrices in matrix completion by moving it in the opposite direction of the estimated gradient. To ensure that the solution stays within the domain $\mathcal{K}$, SGD has to project the updated solution into the $\K$ at {\it every iteration}. For complex domains, such as a positive semidefinite (PSD) cone in metric learning and bounded trace norm matrices in matrix completion, the projection step requires solving an expensive convex optimization, leading to a high computational cost per iteration and consequently making SGD unappealing for large-scale optimization problems over such domains. For instance, projecting a matrix into a PSD cone requires computing the full eigen-decomposition of the matrix, whose complexity is cubic in the size of the matrix. The question is can we somehow reduce the number of projection steps? Our recent work solves this problem and will appear in NIPS 2012.

Wednesday, September 5, 2012

Hedging when the Number of Good Experts is Small

In many of the online decision making algorithms such as Weighted Majority and Hedge algorithms, we treat all the experts in a same way and the regret bounds are dependent on the number of experts. But in many applications, we know that the number of good experts (i.e., experts with small cumulative loss) is small or experts have similar behavior such that we can group them in a small number of expert groups. Then, this question arises that can we bound the regret in terms of number of the good experts or the number of clusters of experts. Although in full information setting the regret bound depends logarithmically on the number of experts, having bound with logarithmic dependence on the number of good experts makes a significant improvement in applications with combinatorial structure or exponential number of decisions.

Friday, July 27, 2012

Online Learning vs Online Algorithms

Recently, I got chance to go through the following excellent paper appeared in COLT 2012 entitled Unified Algorithms for Online Learning and Competitive Analysis. When you read literature on online learning and try to make a connection to online algorithm, you may end up with some vague thought without any serious relation (at least this happened to me when I was reading chapter 5 of the book Prediction, Learning, and Games). Although Avrim Blum et. al did an excellent work on relating learning from expert advice to metrical task problem (MST), but the current paper is really can be considered as a breakthrough in relating two inherently different problems with similar structure. Another nice thing about this paper is that it is done by a combination of researchers from two fields, i.e., online learning and competitive analysis. I really liked the paper and it was one of the papers I was looking forward in the last two years to see:-)

Thursday, June 7, 2012

Online Learning with Budgeted Learner

To motivate this problem let us consider the online marketing problem. In the standard setting, a learner has a unlimited number of goods with zero copy cost who attends customers one by one. For each customer the seller proposes a price and if the customer be happy with that price (i.e., the price is less than equal to his valuation which is unknown to the seller), he/she will buy the product; otherwise, the seller goes to the next customer. We can cast this problem as a multi armed bandit (MAB) problem by discretizing the prices and correspond each price as an arm. But this setting fails to model real situations owing to a lack of modeling following scenarios:

(1) There exists a limited number of goods to be sold

(2) There is a cost associated with each round, e.g., the time the seller spends

(3) The seller is supposed to sell at least a fixed predefined number of products

The second and third scenarios can be modeled as an online learning with constraints recently attracted a lot of interest in learning community. However, in this post, our main focus is on the first case where the learner/seller has a limited number of goods. Now, the question is that how the learner can maximize his revenue by considering the fact that he has a fixed number of goods to sell. This problem is more involved since it combines different problems as a single problem: online learning, pricing strategy, and budgeted learning. I have not checked the literature to find out is there any proposed solution, but I believe it is an interesting problem to be investigated. The first simple question one would ask at first glance is that can we reduce this problem to the existing MAB problems and utilize the known algorithms such as UCB in stochastic setting and EXP3 in adversarial setting to tackle it?