Artificial Intelligence as a well-defined mathematical problem was solved a number of years ago through the formulation of the AIXI agent by Prof Marcus Hutter — see https://theconversation.com/to-create-a-super-intelligent-machine-start-with-an-equation-20756 for a quick introduction — but a key fundamental issue with the AIXI theory has always been the incomputability of the general solution.
In a continuation of work that started more than 10 years ago on approximating AIXI with Context Tree Weighting and Monte Carlo Tree Search, I am happy to report that my two PhD students Samuel Yang-Zhao and Tianyu Wang have now successfully extended the class of models that AIXI can be approximated over to essentially all (efficiently) computable functions, with the caveat that there is now a highly non-trivial feature-selection problem that we are solving heuristically using ideas from state-abstraction theory, Binary Decision Diagrams and Random Forests.
There is still more work to be done, but in the meantime please enjoy this paper that has been accepted at this year’s NeurIPS conference.
https://arxiv.org/abs/2210.06917
Here’s the paper abstract:
We propose a practical integration of logical state abstraction with AIXI, a Bayesian optimality notion for reinforcement learning agents, to significantly expand the model class that AIXI agents can be approximated over to complex history-dependent and structured environments. The state representation and reasoning framework is based on higher-order logic, which can be used to define and enumerate complex features on non-Markovian and structured environments. We address the problem of selecting the right subset of features to form state abstractions by adapting the Φ-MDP optimisation criterion from state abstraction theory. Exact Bayesian model learning is then achieved using a suitable generalisation of Context Tree Weighting over abstract state sequences. The resultant architecture can be integrated with different planning algorithms. Experimental results on controlling epidemics on large-scale contact networks validates the agent’s performance.