W. Krichene, M. C. Bourguiba, K. Lam, and A. Bayen. On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game.
ACM Transactions on Cyber-Physical Systems - Special Issue.
bibtex
abstract
pdf
@article{krichene2018tcps,
author = {Krichene, Walid and Bourguiba, Mohamed Chedhli and Tlam, Kiet and Bayen, Alexandre},
title = {On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game},
journal = {ACM Trans. Cyber-Phys. Syst.},
issue_date = {February 2018},
volume = {2},
number = {1},
month = jan,
year = {2018},
issn = {2378-962X},
pages = {6:1--6:23},
articleno = {6},
numpages = {23},
doi = {10.1145/3078620},
acmid = {3078620},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Routing game, behavioral experiment, sequential decision model},
}
The routing game models congestion in transportation networks, communication networks, and other cyber-physical systems in which agents compete for shared resources. We consider an online learning model of player dynamics: at each iteration, every player chooses a route (or a probability distribution over routes, which corresponds to a flow allocation over the physical network), then the joint decision of all players determines the costs of each path, which are then revealed to the players.
We pose the following estimation problem: given a sequence of player decisions and the corresponding costs, we would like to estimate the parameters of the learning model. We consider, in particular, entropic mirror descent dynamics and reduce the problem to estimating the learning rates of each player.
In order to demonstrate our methods, we developed a web application that allows players to participate in a distributed, online routing game, and we deployed the application on Amazon Mechanical Turk. When players log in, they are assigned an origin and destination on a shared network. They can choose, at each iteration, a distribution over their available routes, and each player seeks to minimize her own cost. We collect a dataset using this platform, then apply the proposed method to estimate the learning rates of each player. We observe, in particular, that after an exploration phase, the joint decision of the players remains within a small distance of the set of equilibria. We also use the estimated model parameters to predict the flow distribution over routes, and compare our predictions to the actual distributions, showing that the online learning model can be used as a predictive model over short horizons. Finally, we discuss some of the qualitative insights from the experiments, and give directions for future research.