DeepSeek and All That

I wrote the following comments very quickly on LinkedIn about 10 days ago in the middle of the DeepSeek frenzy, and the post turned out to be my second most-read post ever and people continues to look at it. I think the comments are holding up well so I am sharing them here as well, plus a link to the best full explanation I found so far. So here it goes (with some minor edits):

First of all, we should distinguish between the technical advances in the DeepSeek-V3 model and the DeepSeek-R1 model. The former is an achievement in Model Pre-training using unlabelled data, where the company showed it’s possible to push the boundary on a few known techniques — mixtures of experts, quantisation, and low-rank compressed vector approximations — to essentially reproduce state-of-the-art LLMs using only ~2000 GPUs at an estimated cost of around US$5.6m. That is a small fraction of the estimated cost of US$100-400 million and tens of thousands of GPUs that the likes of OpenAI, Meta, and Anthropic require for their efforts.

The DeepSeek-R1 model, in contrast, is an achievement in Model Post-training, i.e. the fine-tuning of a model obtained from the Pre-training step using labelled data and human domain knowledge, where the company showed it is possible to fine-tune, using reinforcement learning with (presumably hand-crafted) rule-based reward functions, a pre-trained model in the direction of engaging in quality long-chained reasoning when answering questions. The rule-based reward functions appear to capture “syntactic” properties of what good reasoning steps look like, rather than the actual details of each reasoning step, thus removing the need for a large number of human labelled data usually required for Post-training. This reminds me of books that teach high-level problem-solving strategies rather than the specifics of a problem domain — e.g. Polya’s How to Solve It and Page’s The Model Thinker — and is widely believed to be what OpenAI researchers meant by “thinking tokens”.

In addition to advances on test-time inference algorithms, these two advances in Model Pre-training and Post-training, while incremental in technical nature, can be considered a breakthrough from a geo-economics perspective, as they may open the door to more-or-less complete commoditisation of Generative AI technologies, thus levelling the playing field for all involved.

Exciting times.

That’s what I wrote in 30 minutes. And if you have 4 hours to better understand the issue, I would highly recommend this episode on Lex Fridman’s podcast.


Leave a comment