Last-iterate Convergence in No-regret Learning: Games with Reference Effects Under Logit Demand

88 Pages Posted: 7 Nov 2023 Last revised: 11 Mar 2025

See all articles by Mengzi Amy Guo

Mengzi Amy Guo

University of California, Berkeley - Department of Industrial Engineering and Operations Research

Donghao Ying

University of California, Berkeley - Department of Industrial Engineering and Operations Research

Javad Lavaei

University of California, Berkeley

Zuo-Jun Max Shen

University of California, Berkeley - Department of Industrial Engineering & Operations Research (IEOR)

Date Written: October 10, 2023

Abstract

This work is dedicated to the algorithm design in an oligopoly price competition, with the primary goal of examining the long-run market behavior. We consider a realistic setting where n firms engage in a multi-period price competition within a partial information setting---each firm can only access its first-order feedback and lacks information about its competitors---under the influence of reference effects. Consumers assess their willingness to pay by comparing the current price against the memory-based reference price, and their choices follow the multinomial logit (MNL) model. We use the notion of stationary Nash equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy, to simultaneously capture the long-run equilibrium and stability. With loss-neutral reference effects, we propose the online projected gradient ascent (OPGA) algorithm, where each firm adjusts the price using the first-order derivatives of its log-revenues, accessible through the market feedback mechanism. Despite the absence of typical properties required for the convergence of online games, such as strong monotonicity and variational stability, we demonstrate that under diminishing step-sizes, the price and reference price paths generated by the OPGA attain last-iterate convergence to the unique SNE, and thereby guarantee the no-regret learning and market stability. Moreover, with appropriate step-sizes, we prove that this algorithm exhibits a convergence rate of O(1/t^2) and achieves a constant dynamic regret. The inherent asymmetry nature of reference effects motivates the exploration beyond loss-neutrality. When loss-averse reference effects are introduced, we propose the conservative-OPGA (C-OPGA) algorithm to handle the non-smooth revenue functions and show that the price and reference price achieve last-iterate convergence to the set of SNEs with the rate of O(1/\sqrt{t}). Finally, we demonstrate the practicality and robustness of OPGA and C-OPGA by theoretically showing that these algorithms can also adapt to firm-differentiated step-sizes and inexact gradients.

Keywords: last-iterate convergence, price competition, reference effect, multinomial logit model

Suggested Citation

Guo, Mengzi Amy and Ying, Donghao and Lavaei, Javad and Shen, Zuo-Jun Max, Last-iterate Convergence in No-regret Learning: Games with Reference Effects Under Logit Demand (October 10, 2023). Available at SSRN: https://ssrn.com/abstract=4597658 or http://dx.doi.org/10.2139/ssrn.4597658

Mengzi Amy Guo (Contact Author)

University of California, Berkeley - Department of Industrial Engineering and Operations Research ( email )

4141 Etcheverry Hall
Berkeley, CA 94720-1777
United States

Donghao Ying

University of California, Berkeley - Department of Industrial Engineering and Operations Research ( email )

4141 Etcheverry Hall
Berkeley, CA 94720-1777
United States

Javad Lavaei

University of California, Berkeley ( email )

310 Barrows Hall
Berkeley, CA 94720
United States

Zuo-Jun Max Shen

University of California, Berkeley - Department of Industrial Engineering & Operations Research (IEOR) ( email )

IEOR Department
4135 Etcheverry Hall
Berkeley, CA 94720
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
151
Abstract Views
801
Rank
420,289
PlumX Metrics