Thursday, March 18, 2010

Online Learning with Unbounded and Non-convex Loss Function

Actually this a a new thought attracted me. In all of the online learning algorithms, we assume that the loss function is bounded. Also, if we do not use perturbation in our algorithm, the loss function should be convex in first argument. So the question is that is there any way to relax these assumptions, more specifically

1) How can we get rid of boundedness constraint?
2) How can we use non-convex loss function in learning?

The motivation for second problem is well known 0-1 loss function. We know by adding some randomization in our prediction, this loss function behaves like absolute loss function. So, can we generalize this idea to other nonconvex or maybe some other soft constraints? Are there any real applications for such loss functions?

I will post more info regarding this problem when I rich my knowledge.

Betting on Permutation using Expert Advices

Today I had a chance to read original paper of Hedge algorithm. The motivating problem was about betting on horse races. A gambler, decides to allow a group of his friends to make bets on his behalf and the question is how to combine the decisions of friends. This is exactly the setting of learning using expert advice. After reading, I thought about betting languages. Betting only on one horse as a winner is a special case of betting languages. Permutation betting and betting on a interval are another languages for betting. In permutation betting you bet on a permutation of all horses and in interval betting you bet on subset of size k of horses for first k ranks. Another betting language maybe be considered as pairwise betting. In this type of betting you bet on a mapping between horses and ranks. Now the question is that can we apply Hedge or other types of algorithms for learning betting of types of mentioned languages?

Monday, March 1, 2010

Model selection in Online learning

Reading the paper "On the Generalization Ability of On-Line Learning
Algorithms", I felt that i.i.d assumption is not a big deal. Recently I've read this excellent post Adaptivity in online models? and tuned myself. The point is that i.i.d assumption is not just for ensuring generalization ability of learning. Another usefulness of i.i.d assumption is in model selection. In regularized error minimization for machine learning algorithms we have a trading parameter between error and regularization. Having i.i.d assumptions for samples, we can employ cross validation method to find a value for his parameter. In other words, if we want to choose an appropriate value for a regularization parameter we can split the samples and use a validation set to find a good approximation of parameter.

So, the question is how can we choose such parameters in online learning algorithms having no i.i.d assumption and not having all samples in advance. At the first glance, adaptive adjustment of parameter is obvious solution. Are there other methods, or can current methods be improved considering the inherent limitations of online learning setting?

Friday, February 19, 2010

Theories in Statistical learning theory

In my view, two main important questions regarding learning algorithms are generalization ability and stability analysis. When I have started working on learning algorithms, I asked myself this question: "Why can a learning algorithm predict future samples only by finding a classifier which minimizes the empirical error on training samples?". I have encountered a lot of theories and concepts looking for a answer to this question and I wanna deep my understanding of them. Please note that all of following concepts are not exactly related to answering the mentioned question but somehow related to it.

1) No free launch theorem
2) PAC learning (realizable and agnostic)
3) Occam learning
4) VC dimension
5) Covering numbers
6) Growth functions
7) Rademacher complexity

Wait for my future post for a survey of these concepts.

Concenteration inequalities

Proving generalization bounds for most of learning algorithms requires a good knowledge of concentration inequalities beside rich knowledge of probability theory and sometimes advanced topics such as Martingale, etc.

Concentration inequalities bound tail probabilities of general functions of independent random variables. Several methods have been known to prove such inequalities, including:

1)Martingale methods
2)Information-theoretic methods
3)Talagrand’s induction method
4)Some problem-speciļ¬c methods

Many inequalities have been proposed such as Hoeffding, Berstein, Azuma, McDiarmid , so on. I have started to rich my knowledge in this area. In future posts I will talk more about them or maybe provide a survey document of what I will learn.

Saturday, February 6, 2010

Prediction, Learning and games

When I started working on online learning algorithms, my adviser Dr. Jin introduced me the excellent book Prediction, learning and games and I have started to read it. At the first glance, it seems easy but when I went deeper in the concepts I found it very challenging. I will talk about the chapters in future posts.