In this post on the last day of the year, I thought I will share a map of mathematical structures that are useful for thinking about knowledge representation and reasoning (KRR) issues in Artificial Intelligence and Machine Learning. It is built on top of the diagram shown in Max Tegmark’s paper Is “the theory of everything” merely the ultimate ensemble theory? and extended with my own understanding of historical and recent work across quite a few different fields of AI. As such, it is necessarily biased towards my own personal experience and taste.

Generally speaking, each arrow involves the addition of some new symbols and the axioms that provide their definitions and / or properties. Some boxes have multiple incoming arrows; these are systems constructed from the union of multiple sets of new symbols and axioms. Note also that the relationships represented by the arrows are, in general, transitive.
Essentially all the systems have a syntax, a semantics in the styles of Tarski or Kripke, and a Hilbert-style proof procedure. The expressiveness of the different systems are tightly connected. The progression from propositional logic (Boolean algebra) to first-order logic (predicate calculus), second-order logic (Natural numbers) and ultimately higher-order logic (Type theory) is a well-studied area, albeit one with many intricacies. We care primarily about three properties: soundness, completeness, and decidability. Soundness relates to whether a statement shown to be true by the proof procedure via a syntactic proof is indeed true in the semantics of the logic. Completeness relates to whether every statement that is true in the semantics of the logic has a syntactic proof in the proof procedure. Lastly, decidability relates to whether the proof procedure can effectively decide whether an arbitrary given statement is true or false. Propositional logic is sound, complete and decidable. First-order logic is sound, complete but not decidable. Higher-order logic (which includes second-order logic) with standard semantics is sound but incomplete, as shown by Gödel’s Incompleteness Theorem. In practice, one can achieve completeness for higher-order logic by adopting the Henkin semantics, in which case one can show that higher-order logic is effectively equivalent to (many-sorted) first-order logic. (More technically, the domain of a type a -> b in a model in standard semantics consists of all functions from the domain of a to the domain of b, whereas a general model in the Henkin semantics only need to contain a non-empty subset of functions from the domain of a to the domain of b. By focussing on general models with elements that are nameable explicitly in the syntax of higher-order logic, Henkin was able to show that every consistent theory has a general model, which then opens the door to proving the contra-positive of the Completeness Theorem for higher-order logic; see [SV] for a clear and succinct description.) In the context of knowledge representation and reasoning for AI, the Henkin semantics is appropriate for higher-order logic, in which case we get the best of both worlds: an expressive language similar to informal mathematics for representing and reasoning about the world, while retaining the underlying soundness and completeness of first-order logic. (Decidability is not achievable for all but the simplest logics.)
In the diagram, I have also shown how higher-order logic can be extended to modal higher-order logic [KrrMHL] and probabilistic higher-order logic [PoRL,PoEL]; the two extensions are orthogonal and it is possible to construct probabilistic multi-modal higher-order logic, where the interpretations are Kripke structures and there is a probability measure over those interpretations. Propositional logic and first-order logic can be extended in a similar way; the former extensions are shown in the diagram but the latter are not, to avoid clutter in the middle of the diagram. All these logics can be shown to be sound and complete using Henkin semantics.
In AI, formal logics can be used primarily in one of two ways: (1) as a formal language for agent designers to specify systems, including multi-agent systems, and prove properties about them; (2) as a formal knowledge representation language used inside an agent for it to represent and reason about the world. Generally speaking, modal logics and higher-order logics are usually adopted for the former, and first-order logics, the latter. The application of modal logics to modelling and reasoning about games [LiG] and multi-agent systems [RaRA] have been an especially fruitful area, as is the application of higher-order logic to model and prove properties of highly complex software and hardware systems [Isabelle, seL4]. A notable exception is Lloyd’s modal higher-order logic, which is designed to be used inside an agent for learning and reasoning [LfL].
There are also different boxes that provide equivalent ways of defining computability, including logic, lambda calculus, and Turing machines. Naturally, there is a connection between some of the boxes to programming paradigms and actual programming languages, for example SQL (relational algebra), SAT solvers (Boolean algebra), logic programming variants like Prolog and Answer Set Programming (predicate calculus), system and ontology specification languages like UML/SySML and OWL (description logic), functional languages like ML, Haskell (higher-order logic), and imperative languages like C++, Java (von Neumann machines, which extend universal Turing machines).
The study of mathematical structures like manifolds and Hilbert spaces near the top of the diagram have been instrumental in many advances in the theory and practice of statistical machine learning, giving us a good understanding of dual representations of optimisation problems (through so-called Representer Theorems), different types of iterative gradient-based optimisation algorithms, and guidance on when to use what algorithms on which representations. Many of these ideas and algorithms now underlie the astonishing results we are getting from deep neural networks.
The abstract algebra structures shown on the left of the diagram are historically studied in cryptography. Recent advances in lattice-based cryptography, especially algebraic number theory, have given rise to reasonably efficient homomorphic encryption schemes based on the Ring Learning with Error problem. The use of such homomorphic encryption schemes inside Machine Learning algorithms is providing us with the opportunity to make significant advances in Privacy-Preserving Machine Learning, addressing some of the issues being considered in Responsible Use of AI.
On the upper right of the diagram, we have Probabilistic Graphical Models and the special case of Markov Decision Processes. These structures are not as foundational to mathematics as the other structures shown in the diagram, but they are absolutely foundational to Machine Learning and Artificial Intelligence. Historically, Probabilistic Graphical Models have served as a successful alternative to logic-based AI and delivered many practical applications in a range of areas. In the agent context, the study of Hidden Markov Models (HMMs) yielded widely used tracking algorithms like Kalman Filters, Extended Kalman Filters, Particle Filters, and their multi-sensor multi-target versions, including recent results on filtering algorithms for random finite sets and factored graphs. Markov Decision Processes (MDPs) are HMMs that are augmented with action nodes and reward observations, and the study of MDPs and partially observable MDPs lie at the heart of the design and implementation of intelligent AI agents that can reason and learn to act optimally to achieve long-term expected rewards.
Some of the most exciting AI work happen in the intersection of probabilistic graphical models (in particular POMDPs), logic (first-order and higher-order logics), causal inference (Do Calculus), and privacy technologies (homomorphic encryption and differential privacy). I believe a deep and broad understanding of many of the different mathematical structures listed in the diagram above is required to carry out such research, which may give us the best chance of achieving practical and responsible AI in a scalable manner.
References:
[Isabelle] Tobias Nipkow, Markus Wenzel, Lawrence Paulson, Isabelle/HOL: A Proof Assistant for Higher-Order Logic, Springer, 2002.
[KrrMHL] John Lloyd, Knowledge Representation and Reasoning in Modal Higher-order Logic, 2007.
[LfL] John Lloyd, Logic for Learning, Springer-Verlag, 2003.
[LiG] Johan van Benthem, Logic in Games, MIT Press, 2014.
[RaRA] Michael Wooldridge, Reasoning about Rational Agents, MIT Press, 2003.
[PoRL] Haim Gaifman and Marc Snir, Probabilities over rich languages, testing and randomness, Journal of Symbolic Logic, 1982.
[PoEL] Marcus Hutter et al, Probabilities on Sentences in an Expressive Logic, Journal of Applied Logic, 2013.
[seL4] Gerwin Klein et al, seL4: Formal Verification of an OS Kernel, Proc. ACM SIGOPS, 2009.
[SV] William Farmer, The Seven Virtues of Simple Type Theory, Journal of Applied Logic, 2008.
Hi, nice article! I was wondering what you used to generate the diagram? I’ve been looking into different approaches but your diagram is cleaner than others I could find. Any tips would be great 🙂
LikeLike
Thanks. The diagram is generated using tikz in latex. Hope that helps.
LikeLike