Puzzles and Mysteries in Generative AI

Of the many questions we wish to answer using LLMs, it can be useful to distinguish between puzzles and mysteries. As Gregory Teverton explained in his many articles, a puzzle is a problem that has a definite and verifiable answer, but a mystery is one that “poses a question that has no definitive answer because the answer is contingent; it depends on a future interaction of many factors, known and unknown.” In particular, “A mystery cannot be answered; it can only be framed, by identifying the critical factors and applying some sense of how they have interacted in the past and might interact in the future. A mystery is an attempt to define ambiguities.”

Here’s Malcolm Gladwell in his characteristically eloquent prose in the 2007 article Open Secrets: Enron, intelligence, and the perils of too much information: “Osama bin Laden’s whereabouts are a puzzle. We can’t find him because we don’t have enough information. The key to the puzzle will probably come from someone close to bin Laden, and until we can find that source bin Laden will remain at large. The problem of what would happen in Iraq after the toppling of Saddam Hussein was, by contrast, a mystery. It wasn’t a question that had a simple, factual answer. Mysteries require judgments and the assessment of uncertainty, and the hard part is not that we have too little information but that we have too much. The CIA had a position on what a post-invasion Iraq would look like, and so did the Pentagon and the State Department and Colin Powell and Dick Cheney and any number of political scientists and journalists and think-tank fellows. For that matter, so did every cabdriver in Baghdad.”

A mathematical (but likely incomplete) characterisation is that puzzles are problems in the NP (Non-deterministic Polynomial Time) class, which has the property that if a candidate solution is given, its correctness can be verified efficiently in polynomial time. (This doesn’t mean the problem is necessarily easy to solve, in that the process to find the candidate solution may still take a very long time.) In contrast, mysteries are problems that are closer to NP-hard problems that sit outside of NP. These are problems where not only is the process to find a candidate solution computationally expensive, but the process to verify whether a candidate solution is correct is also computationally intractable. (My understanding is that NEXP is the only class of problems to have been proven to be a strict superset of NP, by the non-deterministic time hierarchy theorem.)

There has been a lot of excellent progress in the design and implementation of Generative AI tools and solutions for solving puzzles. In the programming space, we now have mature code generators like GitHub Co-pilot and increasingly mature integrated programming assistants like Devin and Replit that can perform at the level of a junior engineer as part of a software development team. In mathematics, Terence Tao and co have made leaps and bounds in advancing collaborative human-machine mathematical research at scale in The Equational Theories Project, by combining human-generated proofs and machine-generated proofs (constructed using proof assistants and LLMs) in the formal language Lean. In game playing, the combination of deep reinforcement learning and game-theoretic search algorithms that can find Nash equilibria have produced superhuman level performance in games like poker. These are just a small sample of what has been achieved.

What about Gen AI solutions for tackling mysteries? The most prominent product in that space is probably OpenAI’s Deep Research. There are domain-focussed solutions too, including the Dragonfly Thinking solution that uses structured frameworks, like company founder Anthea Roberts’ Risk, Reward and Resilience Framework for integrative policy making in complex environments, to design guided workflows for a human user to interrogate LLMs to explore the problem and solution space in a systematic way. In the Defence context, LLMs can be used in a similar way as part of an established decision-making process like the Joint Military Appreciation Process to generate and analyse scenarios and possible courses of actions. The Test-and-Evaluation problem for Gen AI solutions designed to tackle mysteries is itself a mystery (rather than a puzzle). The solution, by analogy, is to focus on the thinking process, in particular making sure we use a variety of mental models and structured frameworks to cover all the bases.


Leave a comment