Online and Game Theoretic Learning

Unified Algorithms for Stochastic and Adversarial Online Learning

2013-01-08T08:14:00.002-08:00

Algorithms for online decision making independently have been developed in three different settings as:

Stochastic Rewards
Adversarial Rewards
Variation based Regret Bound

Usually, algorithms such as UCB and Hedge attain $\log T$ and $O(\sqrt{T})$ regret bound in stochastic and adversarial settings, respectively, which are known to be optimal and tight. But, unfortunately all these algorithms are developed for that specific setting and can not be applied to other setting. For example, when UCB is provided with adversarial rewards or Hedge with stochastic rewards, they can not obtain the optimal bound known for that setting.

Algorithms with variation based bound tries to bridge the gap from adversarial setting to stochastic setting by developing algorithms with regret bound bounded by the variation of cost/reward vectors (check Hazan and Kale's work at COLT 2008 and our work at COLT 2012). So, it has been an unresolved question that is there any algorithm that works well for both settings. Such an algorithm, needs to be adaptive to the uncertainty in the sequence of rewards and is expected to attain $\log T$ regret bound when encountered with stochastic rewards and $O(\sqrt{T})$ regret bound for adversarial rewards.

Recently, the FlipFlop algorithm which is a combination of Follow-the-Leader (FTL) and AdaptiveHedge algorithms made a progress towards solving this issue but still there is a room to develop better algorithm. The FlipFlop algorithm retains $O(\sqrt{T})$ regret bound in worst case, but has better guarantees for stochastic setting and it has been shown that its performance should never be substantially worse than that of FTL algorithm.

Challenging Problem: developing efficient unified algorithm that attains $O(\log T)$ and $O(\sqrt{T})$ regret bounds in stochastic and adversarial settings, respectively. The proposed algorithm is not necessarily a combination of algorithms from both settings and would be a novel algorithm with guaranteed bounds as mentioned.

NIPS 2012 Papers on Online Learning and Convex Optimization

2012-11-10T22:49:00.003-08:00

Here is the list of interesting papers accepted in NIPS 2012 related to Online Learning and Convex Optimization:

A. Online Learning

- No-Regret Algorithms for Unconstrained Online Convex Optimization

- Mirror Descent Meets Fixed Share (and feels no regret)

- Relax and Randomize : From Value to Algorithms

- Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

- Confusion-Based Online Learning and a Passive-Aggressive Scheme

- Risk-Aversion in Multi-armed Bandits

- Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

B. Convex Optimization

- Optimal Regularized Dual Averaging Methods for Stochastic Optimization

- Proximal Newton-type Methods for Minimizing Convex Objective Functions in Composite Form

- Query Complexity of Derivative-Free Optimization

- Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

- A quasi-Newton Proximal Splitting Method

- Stochastic Gradient Descent with Only One Projection

- A Stochastic Gradient Method with an Exponential Convergence  Rate with Finite Training Sets

- Communication/Computation Tradeoffs in Consensus-Based Distributed Optimization

- Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

- Feature Clustering for Accelerating Parallel Coordinate Descent

Online Convex Optmization with Switching Costs

2012-10-23T13:29:00.003-07:00

Recently in COLT 2012, an excellent paper entitled Unified Algorithms for Online Learning and Competitive Analysis makes a progress towards unifying the algorithms from two different communities of online leaning and competitive analysis. There is a few crucial differences in assumptions and goals of these two frameworks for online decision making which is the reason why there has not been much research on connecting the two. First, in competitive analysis (a.k.a online algorithms) there is a state concept which is usually modeled as switching or movement cost. Second, in online algorithms one assumes that the cost vector for next round is known to the algorithm to make the decision; in contrast, in online learning, the decision maker first makes the decision for next round and after that observes the cost vector. One interesting questions in connecting these two setting would be adding the switching cost to the regret incurred by the online learner. More specifically, one may consider the as the movement cost at round t. Obviously, the decision maker needs to consider the switching cost in his decision to minimize it. Having this formulation, the learner competes with the best offline decision maker with bounded switching cost. When the bound on switching cost is in terms of number of rounds T, then we recover the standard setting of online learning. I will post more about this problem and some solutions soon, hopefully :-)

Stochastic Gradient Descent with only ONE Projection

2012-10-23T13:15:00.000-07:00

With the increasing amount of data that is available for training, it becomes an urgent task to devise efficient algorithms for optimization/learning problems with unprecedented sizes. Online learning algorithms, such as celebrated Stochastic Gradient Descent (SGD) and its counterpart Online Gradient Descent (OGD), despite of their slow rate of convergence compared with the batch methods, have shown to be very effective for large scale and online learning problems, both theoretically and empirically. Although a large number of iterations is usually needed to obtain a solution of desirable accuracy, the lightweight computation per iteration makes SGD attractive for many large-scale learning problems. To find a solution within the domain $\K$ that optimizes the given objective function $f(\x)$, SGD computes an unbiased estimate of the gradient of $f(\x)$, and updates the solution %for complex domains, such as a positive semidefinite (PSD) cone in metric learning and bounded trace norm matrices in matrix completion by moving it in the opposite direction of the estimated gradient. To ensure that the solution stays within the domain $\mathcal{K}$, SGD has to project the updated solution into the $\K$ at {\it every iteration}. For complex domains, such as a positive semidefinite (PSD) cone in metric learning and bounded trace norm matrices in matrix completion, the projection step requires solving an expensive convex optimization, leading to a high computational cost per iteration and consequently making SGD unappealing for large-scale optimization problems over such domains. For instance, projecting a matrix into a PSD cone requires computing the full eigen-decomposition of the matrix, whose complexity is cubic in the size of the matrix. The question is can we somehow reduce the number of projection steps? Our recent work solves this problem and will appear in NIPS 2012.

Hedging when the Number of Good Experts is Small

2012-09-05T08:58:00.000-07:00

In many of the online decision making algorithms such as Weighted Majority and Hedge algorithms, we treat all the experts in a same way and the regret bounds are dependent on the number of experts. But in many applications, we know that the number of good experts (i.e., experts with small cumulative loss) is small or experts have similar behavior such that we can group them in a small number of expert groups. Then, this question arises that can we bound the regret in terms of number of the good experts or the number of clusters of experts. Although in full information setting the regret bound depends logarithmically on the number of experts, having bound with logarithmic dependence on the number of good experts makes a significant improvement in applications with combinatorial structure or exponential number of decisions.

Online Learning vs Online Algorithms

2012-07-27T12:17:00.003-07:00

Recently, I got chance to go through the following excellent paper appeared in COLT 2012 entitled Unified Algorithms for Online Learning and Competitive Analysis. When you read literature on online learning and try to make a connection to online algorithm, you may end up with some vague thought without any serious relation (at least this happened to me when I was reading chapter 5 of the book Prediction, Learning, and Games). Although Avrim Blum et. al did an excellent work on relating learning from expert advice to metrical task problem (MST), but the current paper is really can be considered as a breakthrough in relating two inherently different problems with similar structure. Another nice thing about this paper is that it is done by a combination of researchers from two fields, i.e., online learning and competitive analysis. I really liked the paper and it was one of the papers I was looking forward in the last two years to see:-)

Online Learning with Budgeted Learner

2012-06-07T11:51:00.003-07:00

To motivate this problem let us consider the online marketing problem. In the standard setting, a learner has a unlimited number of goods with zero copy cost who attends customers one by one. For each customer the seller proposes a price and if the customer be happy with that price (i.e., the price is less than equal to his valuation which is unknown to the seller), he/she will buy the product; otherwise, the seller goes to the next customer. We can cast this problem as a multi armed bandit (MAB) problem by discretizing the prices and correspond each price as an arm. But this setting fails to model real situations owing to a lack of modeling following scenarios:

(1) There exists a limited number of goods to be sold

(2) There is a cost associated with each round, e.g., the time the seller spends

(3) The seller is supposed to sell at least a fixed predefined number of products

The second and third scenarios can be modeled as an online learning with constraints recently attracted a lot of interest in learning community. However, in this post, our main focus is on the first case where the learner/seller has a limited number of goods. Now, the question is that how the learner can maximize his revenue by considering the fact that he has a fixed number of goods to sell. This problem is more involved since it combines different problems as a single problem: online learning, pricing strategy, and budgeted learning. I have not checked the literature to find out is there any proposed solution, but I believe it is an interesting problem to be investigated. The first simple question one would ask at first glance is that can we reduce this problem to the existing MAB problems and utilize the known algorithms such as UCB in stochastic setting and EXP3 in adversarial setting to tackle it?

Regret Bound by Variation for Online Convex Optimization

2011-12-02T09:41:00.001-08:00

Recently, we tried to find regret bounds by variation for general Online Convex Optimization problem. If one goes through the Chapter 2 of Prediction, Learning, and Games, he/she can easily find three different ways to express the regret bound: in terms of the number of rounds T, loss of the best expert, and variation of cost vectors. In Extracting certainty from uncertainty: regret bounded by variation in costs , the authors extend the result to online linear optimization and showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In our work, we extended this result to general OCO. We first analyze the limitations of the algorithm in Extracting certainty from uncertainty: regret bounded by variation in costs when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variation-based regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials.

Online Convex Optimization without Projection

2011-12-02T08:57:00.001-08:00

In this post I would like to briefly discuss our recent work in Online Convex Optimization (OCO) namely OCO with Long Term Constraints. Classical gradient descent optimization techniques usually require a projection step in each iteration, in order to get back to the feasible domain. Although for simple convex domains like Eculidain ball and simplex, this is a well studied problem and there exists efficient algorithms, however; for a variety of applications, this is a non-trivial and computationally burdensome step. One prominent example is semide nite optimization, where the projection of an arbitrary symmetric matrix back to the PSD matrices requires the computation of a complete eigenvalue-decomposition. Comparing this to the linear time update of first order gradient methods, it makes more sense to resolve this problem. In OCO, this is more crucial since the running time of each step supposed to be more efficient. We tackled this problem by posing a new OCO problem. The intuition is to replace the complex convex domain with simpler one with efficient projection algorithm and see how this process affects the general performance of the algorithm. The solution we proposed seems interesting in its own way; leaving some open problems needs through investigation.

COLT 2011 Accepted Papers

2011-09-19T09:24:00.000-07:00

The list of accepted papers for COLT 2011 are available here. I will update this post by adding related papers to my own research soon

Online Learning with a Partially Observable Future Cost

2011-02-12T23:33:00.000-08:00

The main idea in proving regret bound for the Follow the Perturbed Leader (FPL) algorithm is as follows: first, we define the hypothetical leader which knows the future cost vector before prediction on round which attains zero regret. Then we can prove that by adding some perturbation to the current cumulative cost vectors till time the Player gets regret bound. Now the question is that if we have some partial information about future cost or we are allowed to query some information about the next round's cost, can we design algorithms with improved regret bounds. This problem is tackled down in two recent papers. In Online Learning with Prior Knowledge the authors proposed an algorithm which utilize such information as a kind of prior information. Recently in Online Learning with Queries a variant of Weighted Majority Algorithm with capability of querying limited information about the future cost vector is proposed.

No Free Experts, Hedging on Cost

2011-02-12T17:25:00.000-08:00

Consider the following setting for learning from expert advice. In each round of the game between Player and Adversary, the player can ask for experts' suggestions but by paying some amount of money determined by each expert at the beginning of the round. In other words, at the beginning of each trial, the Player gets a cost vector which determines the cost of querying each expert for that round. Also let assume that Player has a budget constraint which would be for each round or long term constraint defined for all rounds. The goal of Player is to attain nontrivial regret bound under the budget constraint. Formally the Player tries to minimize and simultaneously satisfies the constraint .

The mentioned constraint is a long term constraint and we can alternatively define round based constraint as:.

Online Learning with Unbounded and Non-convex Loss Function

2010-03-18T09:38:00.000-07:00

Actually this a a new thought attracted me. In all of the online learning algorithms, we assume that the loss function is bounded. Also, if we do not use perturbation in our algorithm, the loss function should be convex in first argument. So the question is that is there any way to relax these assumptions, more specifically

1) How can we get rid of boundedness constraint?

2) How can we use non-convex loss function in learning?

The motivation for second problem is well known 0-1 loss function. We know by adding some randomization in our prediction, this loss function behaves like absolute loss function. So, can we generalize this idea to other nonconvex or maybe some other soft constraints? Are there any real applications for such loss functions?

I will post more info regarding this problem when I rich my knowledge.

Betting on Permutation using Expert Advices

2010-03-18T09:27:00.000-07:00

Today I had a chance to read original paper of Hedge algorithm. The motivating problem was about betting on horse races. A gambler, decides to allow a group of his friends to make bets on his behalf and the question is how to combine the decisions of friends. This is exactly the setting of learning using expert advice. After reading, I thought about betting languages. Betting only on one horse as a winner is a special case of betting languages. Permutation betting and betting on a interval are another languages for betting. In permutation betting you bet on a permutation of all horses and in interval betting you bet on subset of size k of horses for first k ranks. Another betting language maybe be considered as pairwise betting. In this type of betting you bet on a mapping between horses and ranks. Now the question is that can we apply Hedge or other types of algorithms for learning betting of types of mentioned languages?

Model selection in Online learning

2010-03-01T09:05:00.000-08:00

Reading the paper "On the Generalization Ability of On-Line Learning
Algorithms", I felt that i.i.d assumption is not a big deal. Recently I've read this excellent post Adaptivity in online models? and tuned myself. The point is that i.i.d assumption is not just for ensuring generalization ability of learning. Another usefulness of i.i.d assumption is in model selection. In regularized error minimization for machine learning algorithms we have a trading parameter between error and regularization. Having i.i.d assumptions for samples, we can employ cross validation method to find a value for his parameter. In other words, if we want to choose an appropriate value for a regularization parameter we can split the samples and use a validation set to find a good approximation of parameter.

So, the question is how can we choose such parameters in online learning algorithms having no i.i.d assumption and not having all samples in advance. At the first glance, adaptive adjustment of parameter is obvious solution. Are there other methods, or can current methods be improved considering the inherent limitations of online learning setting?

Theories in Statistical learning theory

2010-02-19T08:41:00.000-08:00

In my view, two main important questions regarding learning algorithms are generalization ability and stability analysis. When I have started working on learning algorithms, I asked myself this question: "Why can a learning algorithm predict future samples only by finding a classifier which minimizes the empirical error on training samples?". I have encountered a lot of theories and concepts looking for a answer to this question and I wanna deep my understanding of them. Please note that all of following concepts are not exactly related to answering the mentioned question but somehow related to it.

1) No free launch theorem
2) PAC learning (realizable and agnostic)
3) Occam learning
4) VC dimension
5) Covering numbers
6) Growth functions
7) Rademacher complexity

Wait for my future post for a survey of these concepts.

Concenteration inequalities

2010-02-19T08:24:00.000-08:00

Proving generalization bounds for most of learning algorithms requires a good knowledge of concentration inequalities beside rich knowledge of probability theory and sometimes advanced topics such as Martingale, etc.

Concentration inequalities bound tail probabilities of general functions of independent random variables. Several methods have been known to prove such inequalities, including:

1)Martingale methods
2)Information-theoretic methods
3)Talagrand’s induction method
4)Some problem-speciﬁc methods

Many inequalities have been proposed such as Hoeffding, Berstein, Azuma, McDiarmid , so on. I have started to rich my knowledge in this area. In future posts I will talk more about them or maybe provide a survey document of what I will learn.

Prediction, Learning and games

2010-02-06T19:58:00.000-08:00

When I started working on online learning algorithms, my adviser Dr. Jin introduced me the excellent book Prediction, learning and games and I have started to read it. At the first glance, it seems easy but when I went deeper in the concepts I found it very challenging. I will talk about the chapters in future posts.