Self Confirming Equilibrium: We Can Only Learn from the Consequences of Our Decisions


Policymakers have to make decisions under uncertainty, based on the outcomes of their previous decisions. The dependence among such decisions and the data on which they are based makes this learning process more complicated to study. However, game-theoretical tools allow the investigation of such settings.
Consider, for example, monetary policies, which can be modelled as single-player games. In fact, even if multiple subjects are involved, only one (the central bank) can take strategic decisions, while the others (citizens, companies) can only adapt to the policy chosen by the central bank. This “game” then proceeds in time, with the single strategic player making recurrent decisions, observing their outcomes – which also depend on an underlying, unobservable, stochastic shock – and updating their beliefs about the process of shocks. It is then of particular interest to investigate whether these sequences of policies and beliefs converge to any stationary (persistent) policy and belief. Moreover, it is paramount to characterize this equilibrium and to assess whether it is optimal in an objective sense, that is, given the true shocks generating process.

The Sliding Doors Effect and Policymaking

In a recent paper, co-authored with the Nobel laureate Thomas Sargent (New York University and Hoover Institution, Stanford), Bocconi professors Pierpaolo BattigalliSimone Cerreia-VioglioFabio Maccheroni, and Massimo Marinacci have answered these key questions through the concept of self-confirming equilibrium. Despite being introduced in 1987 by Battigalli in his undergraduate thesis, the concept of self-confirming equilibrium is still innovative, and is justified under more general conditions than other kinds of equilibria that are routinely employed in game theory and macroeconomics.
In the aforementioned paper, Battigalli and coauthors have shown that a player (policymaker) can keep making suboptimal decisions, despite being guided by rational decision rules. Intuitively, this is due to the fact that any player’s decisions are based on the information learned from the consequences of their previous decisions. In the Sliding Doors movie, we can observe the different destinies of Gwyneth Paltrow’s character depending on the fact that she catches a train or not. In reality, since players cannot observe the consequences of the choices they have not made, their knowledge is potentially and generally incomplete.
In particular, when the outcome of the current strategy is somehow satisfactory, exploring alternative paths may be inconsistent with maximizing expected utility. Even if the player is somewhat patient and willing to sacrifice some immediate utility for the sake of learning, this balancing force will vanish in the long run, getting the player stuck in a self-confirming equilibrium, which unfortunately may be sub-optimal.
“This trap and the resulting suboptimality,” comments Professor Battigalli, “are very relevant to policymaking, especially when the targeted service is public welfare. When the stakes are so high, a principled and transparent mathematical formulation is essential. In fact, even if some of these concepts are somehow intuitive and retain their appeal even without technicalities, a mathematical framework is absolutely necessary to address these problems from a proper scientific standpoint.”
Battigalli, P.Cerreia-Vioglio, S.Maccheroni, F.Marinacci, M., & Sargent, T. (2022). “A framework for the analysis of self-confirming policies.” Theory and Decision92(3), 455-512.

by Sirio Legramanti | Source: Bocconi Knowledge