As is well-known by now, the universal AI agent AIXI is made up of two key components: Solomonoff Induction for universal sequential prediction, and expectimax search for planning. There are several proposed and reasonably effective approximations of the Solomonoff Induction component using the factored, binarised Context Tree Weighting algorithm [WST95, VNHUS09] and its generalisation to arbitrary predicates in a formal logic beyond suffixes [YWN22]. Partly prompted by [YNH24], I discuss in this short note how we can approximate Solomonoff Induction using online prediction-with-experts algorithms like Exponential Weights / Hedge [CL06, AHK12] and their many special cases [HEK18].
The first thing to note is that while many of these algorithms have their roots in online convex optimisation, they can be interpreted as Bayesian mixtures when the loss function is log loss
[KR13]. In particular, the Hedge algorithm is an exact Bayesian mixture model in the case when the loss function is log loss and the learning rate is 1, in which case one can show that the Hedge weight for each expert is its posterior probability (see [CL06, sect 9.2]). The Prod algorithm with log loss [OLL17] has been shown to be a “soft-bayes” algorithm, which coincides with the exact Bayesian mixture when the learning rate is 1 and can be interpreted as a “slowed-down” version of Bayesian mixture or a Bayesian mixture with “partially sleeping” experts when the learning rate is less than 1. The general phenomenon is that when the true environment is contained in the model class, the optimal learning rate is 1 because Bayesian inference is optimal. However, in the agnostic / improper learning setting where the true environment may not be in the model class, the learning rate needs to decrease over time to avoid pathological issues especially in non-convex function classes as explained in [GO17, EGMNRW15].
More generally, [KR13] shows that there is a spectrum of Bayesian mixture algorithms over expert sequences with different priors that can be understood as interpolations of Bayesian mixtures over fixed experts and (non-learning) element-wise mixtures. The spectrum includes Fixed-Share [HW98] with static Bernoulli switching frequencies, Switching distributions with dynamic slowly decreasing switching frequencies [ERG07, EGR12] and dynamically learned switching frequencies [VW98], and more general switching frequencies that are dependent on some ordering on experts [V97, KR13].
Such online-learning algorithms can exhibit good regret-minimising convergence behaviour under different scenarios. In the context of multi-agent systems, regret-minimisation learning algorithms are particularly useful in that one can show a set of agents that all mimimise a notion of regret called swap regret will converge to a correlated equilibrium [A87], where no agent would want to deviate from their preferences / strategies assuming the others also do not deviate. In [BM07], the authors describe a general procedure to turn agents that minimise the standard form of regret into agents that minimise the swap regret, which is regret that allows for any specific agent action to be switched with some other action with the benefit of hindsight. This is achieved via a master algorithm that runs multiple instances of a Hedge-like algorithm, one for each possible action, and then averaging their prediction using the stationary distribution of the weight matrix formed by the individual Hedge-like algorithms.
There is a lot of details that I have glossed over here, but I hope I have given readers a sense of the rich, beautiful and practical theory lying in wake for those willing to explore more.
References
- [A87] Correlated equilibrium as an expression of Bayesian rationality, Robert Aumann, Econometrica: Journal of the Econometric Society, 1987.
- [AHK12] The multiplicative weights update method: a meta-algorithm and applications, Sanjeev Arora, Elad Hazan and Satyen Kale, Theory of Computing, volume 8, 2012.
- [BM07] From external to internal regret, Avrim Blum and Yishay Mansour, Journal of Machine Learning Research, volume 8, 2007.
- [CL06] Prediction, Learning, and Games, Nicolo Cesa-Bianchi and Gabor Lugosi, CUP, 2006.
- [EGMNRW15] Fast rates in statistical and online learning, Tim van Erven, Peter Grunwald, Nishant Mehta, Mark Reid and Robert Williamson, Journal of Machine Learning Research, 2015.
- [EGR12] Catching up faster by switching sooner: A predictive approach to adaptive estimation with an application to the AIC–BIC dilemma, Tim van Erven, Peter Grunwald and Steven de Rooij, Journal of the Royal Statistical Society Series B: Statistical Methodology, volume 74, 2012.
- [ERG07] Catching up faster in Bayesian model selection and model averaging, Tim van Erven, Steven Rooij, and Peter Grunwald, NeurIPS, 2007.
- [GO17] Inconsistency of {B}ayesian inference for misspecified linear models, and a proposal for repairing it, Peter Grunwald and Thijs van Ommen, Bayesian Analysis, volume 12, 2017.
- [HEK18] The many faces of exponential weights in online learning, Dirk Hoeven, Tim van Erven and Wojciech Kotlowski, COLT, 2018.
- [HW98] Tracking the best expert, Mark Herbster and Manfred Warmuth, Machine Learning, 1998.
- [KR13] Universal codes from switching strategies, Wouter Koolen and Steven de Rooij, Steven, IEEE Transactions on Information Theory, volume 59, 2013.
- [OLL17] Soft-Bayes: Prod for mixtures of experts with log-loss, Laurent Orseau, Tor Lattimore and Shane Legg, ALT, 2017.
- [V97] Derandomizing stochastic prediction strategies, Vladimir Vovk, COLT, 1997.
- [VNHUS09] A Monte-Carlo AIXI Approximation, Joel Veness, Kee Siong Ng, Marcus Hutter, William Uther and David Silver, Journal of Artificial Intelligence Research, volume 40, 2011.
- [VW98] Switching between two universal source coding algorithms, Paul AJ Volf and Frans M.J. Willems, Data Compression Conference, 1998.
- [WST95] The context-tree weighting method: basic properties, F.M.J. Willems, Y.M. Shtarkov and T.J. Tjalkens, IEEE Transactions on Information Theory, volume 41, 1995.
- [YNH24] Dynamic Knowledge Injection for AIXI Agents, Samuel Yang-Zhao, Kee Siong Ng and Marcus Hutter, AAAI, 2024.
- [YWN22] A Direct Approximation of AIXI Using Logical State Abstractions, Samuel Yang-Zhao, Tianyu Wang and Kee Siong Ng, NeurIPS, 2022.