Dynamic Knowledge Injection for AIXI Agents

My phd student just got a new paper accepted at the upcoming AAAI Conference on Artificial Intelligence. Here’s the abstract of the paper:

Prior approximations of AIXI, a Bayesian optimality notion for general reinforcement learning, can only approximate AIXI’s Bayesian environment model using an a-priori defined set of models. This is a fundamental source of epistemic uncertainty for the agent in settings where the existence of systematic bias in the predefined model class cannot be resolved by simply collecting more data from the environment. We address this issue in the context of Human-AI teaming by considering a setup where additional knowledge for the agent in the form of new candidate models arrives from a human operator in an online fashion. We introduce a new agent called DynamicHedgeAIXI that maintains an exact Bayesian mixture over dynamically changing sets of models via a time- adaptive prior constructed from a variant of the Hedge algorithm. The DynamicHedgeAIXI agent is the richest direct approximation of AIXI known to date and comes with good performance guarantees. Experimental results on epidemic control on contact networks validates the agent’s practical utility.

One of the reviewers gave us a Strong Accept and provided this commentary:

“This paper leapfrogs the development of AIXI agents by bringing practicality and implementation to a mostly theoretical tool. The profoundness of this paper is in its ability to bring computability to a largely theoretical tool. To the reviewer’s best knowledge, the profoundness of this paper cannot be overstated.

The paper gives a relatively good introduction to the topic. The paper is very easy to read for people used to reading complex mathematical arguments in Category theory.

The construction of the Dynamic Knowledge Injection Setting is well explained and marvelously constructed using well understood techniques.

The theoretical proofs naturally follow from the construction itself. The reviewer appreciates the simplicity and shortness of the proofs.

The reviewer is not sure whether any critique can be reasonably offered.”

We are truly humbled by that review and grateful for the recognition.

A copy of the paper can be found here.


Leave a comment