To motivate this problem let us consider the online marketing problem. In the standard setting, a learner has a unlimited number of goods with zero copy cost who attends customers one by one. For each customer the seller proposes a price and if the customer be happy with that price (i.e., the price is less than equal to his valuation which is unknown to the seller), he/she will buy the product; otherwise, the seller goes to the next customer. We can cast this problem as a multi armed bandit (MAB) problem by discretizing the prices and correspond each price as an arm. But this setting fails to model real situations owing to a lack of modeling following scenarios:

(1) There exists a limited number of goods to be sold

(2) There is a cost associated with each round, e.g., the time the seller spends

(3) The seller is supposed to sell at least a fixed predefined number of products

The second and third scenarios can be modeled as an online learning with constraints recently attracted a lot of interest in learning community. However, in this post, our main focus is on the first case where the learner/seller has a limited number of goods. Now, the question is that how the learner can maximize his revenue by considering the fact that he has a fixed number of goods to sell. This problem is more involved since it combines different problems as a single problem: online learning, pricing strategy, and budgeted learning.
I have not checked the literature to find out is there any proposed solution, but I believe it is an interesting problem to be investigated. The first simple question one would ask at first glance is that can we reduce this problem to the existing MAB problems and utilize the known algorithms such as UCB in stochastic setting and EXP3 in adversarial setting to tackle it?

## No comments:

## Post a Comment