Competitive Pricing Using Model-Based Bandits.
Lukasz Sliwinski, Tanut Treetanthiploet, David Siska, Lukasz Szpruch
Abstract
Open AccessThe use of learning algorithms for automatic price adjustments in markets is on the rise. However, these algorithms often assume that reward distributions for actions are uncorrelated and stationary, a condition that does not hold in competitive pricing environments. In this paper, we introduce a pricing environment, find conditions under which a unique Nash equilibrium exists and verify the assumptions numerically. Then, we propose a bandit algorithm that approximates the structure of the environment and extend it to accommodate non-stationary settings. We perform numerical tests in both stationary and competitive pricing environments, analysing the potential benefits and drawbacks of incorporating the structure of the environment within learning algorithms. While modelling the stationary environment improves the algorithm's performance in a stationary setting, it does not offer an advantage in pricing competitions between non-stationary learning agents.