228. DeepSeek Has Been Inevitable and Here's Why (History Tells Us)
DeepSeek was certain to happen. The only unknown was who was going to do it. The choices were a startup or someone outside the current center of US AI leadership and innovation.
TL;DR for this article: DeepSeek was certain to happen. The only unknown was who was going to do it. The choices were a startup or someone outside the current center of leadership and innovation in AI, which is mostly in the US clustered around trillion-dollar companies. It turned out to be a group in China, which for many (me too) is unfortunate. But again, it absolutely was going to happen. The next question is will the US makers see this with clarity.
Something we used to banter about when things seemed really bleak at Microsoft: When normal companies scope out features and architecture they use t-shirt sizes small, medium, and large. These days (at the time) Microsoft seems capable of only thinking in terms of extra-large, huge, and ginormous. That’s where we are with AI today and the big company approach in the US.
There's more in The Short Case for Nvidia Stock which is very good but focuses on picking stocks, which isn't my thing. Strategy and execution are more me so here's that perspective.
The current trajectory of AI if you read the news in the US is one of MASSIVE CapEx piled on top of even more MASSIVE CapEx. It is a race between Google, Meta, OpenAI/Microsoft, xAI, and to a lesser extent a few other super well-funded startups like Perplexity and Anthropic. All of these together are taking the same approach which I will call “scale up”. Scale up is what you do when you have access to vast resources as all of these companies do.
The history of computing is one of innovation followed by scale up which is then broken by a model that “scales out”—when a bigger and faster approach is replaced by a smaller and more numerous approaches. Mainframe->Mini->Micro->Mobile, Big iron->Distributed computing->Internet, Cray->HPC->Intel/CISC->ARM/RISC, OS/360->VMS->Unix->Windows NT->Linux, and on and on. You can see this at these macro levels, or you can see it at the micro level when it comes to subsystems from networking to storage to memory.
The past 5 years of AI have been bigger models, more data, more compute, and so on. Why? Because I would argue the innovation was driven by the cloud hyperscale companies and they were destined to take the approach of doing more of what they already did. They viewed data for training and huge models as their way of winning and their unique architectural approach. The fact that other startups took a similar approach is just Silicon Valley at work—the people move and optimize for different things at a micro scale without considering the larger picture. See the sociological and epidemiological term small area variation. They look to do what they couldn’t do at their previous efforts or what the previous efforts might have been overlooking.
The degree to which the hyperscalers believed in Scale Up is obvious when you look at all of them building their own Silicon. As cool as this sounds, it has historically proven very very difficult for software companies to build their own silicon. While many look at Apple as a success, Apple’s lessons emerged over decades of not succeeding PLUS they build a device not just silicon. Apple learned from 68k, PPC, Intel, how to optimize a design for their scenarios. Those building AI hardware were solving their in-house scale up challenges—I would have always argued they could gain percentages at a constant factor but not anything beyond that.
Nvidia is there to help everyone not building their own silicon and those that wish to build their own silicon but are also trying to meet immediately needs. As described in “The Short Case” Nvidia also has a huge software ecosystem advantage with CUDA, something they have honed for almost two decades. It is critically important to have an ecosystem, and they have been successful at that. This is why I wrote and thought the DIGITS project is far more interesting than simply a 4000 TOPS desktop (see my CES report).
So now where are we? Well, the big problem we have is that the big scale solutions, no matter all the progress, are consuming too much capital. But beyond that the delivery to customers has been on an unsustainable path. It is a path that works against the history of computing, which is that resources needed become free, not more expensive. The market for computing simply doesn’t accept solutions that cost more, especially consumption-based pricing. We’ve seen Microsoft and Google do a bit of resetting with respect to pricing in the hopes of turning these massive CapEx efforts into direct revenue. I wrote at the time of the initial pricing announcements that there was no way this would be sustainable. It took about a year. Laudable goal for sure but just not how business customers of computing work. At the same time, Apple is focused on the “mostly free” way of doing AI, but the results are at best mixed, and they still have a lot of CapEx going on.
Given that it was inevitable someone was going to look at what was going on and build a scale out solution—one that does not require the CapEx to deliver to customers and might even include architectural approaches that use less CapEx to even built (e.g. train) the product.
The example that keeps running through my mind is how ATT looked at the internet. In all the meetings we had with ATT about building the “information superhighway” they were completely convinced of two things. First, the internet technologies being shown were toys—they were missing all the key features such as being connection based or having QoS (quality of service). For more on toys, see [...] Is a Toy by me.
Second, they were convinced that the right way to build the internet was to take their phone network and scale it up. Add more hardware and more protocols and a lot more wires and equipment to deliver on reliability, QoS, and so on. They weren’t alone. Europe was busy building out internet connectivity with ISDN over their telco networks. ATT loved this because it took huge capital and relied on their existing infrastructure.
They were completely wrong. Cisco came along and delivered all those things on the IP-based network using toy software like DNS. Other toys like HTTP and HTML layered on top. Then came Apache, Linux, and a lot of browsers. Not only did the initial infrastructure prove to be the least interesting part, but it was also drawn into a scale out approach by a completely different player that had previously mostly served weird university computing infrastructure. Cisco did not have tens of billions of dollars nor did Netscape nor did CERN. They used what they could to deliver the information superhighway. The rest is history.
As an example, there was a time when IBM measured the mainframe business by MIPS. The reality was they had 90% plus share of MIPS. But in practice they were selling/leasing MIPS (the acronym not the chip company from Stanford) at ever decreasing prices, just like Intel sold transistors for less. This is all great until you can get MIPS for even less money elsewhere. Intel delivered those. Then ARM found an even cheaper way to deliver more. You get the picture. Repeat this for data storage and you have a great chapter from Innovator’s Dilemma.
The challenge remains that the current AI/hyperscale companies have only two models for bringing an exciting even disruptive technology to market.
First, you bundle the technology as part of what you already sell. This de-monetizes anyone trying to compete with you. Of course, regulators love to think of this as predatory pricing, but the problem is software has little marginal cost (uh oh) and the whole industry is made up of cycles of platforms absorbing more technology from others. It is both an uphill battle for big companies to try to sell separate things (the salespeople are busy selling the big thing) and an uphill battle to try to keep things separate since someone is always going to integrate anyway. Windows did this with Internet Explorer. Word did this with Excel or Excel did this with Word depending on your point of view (See Hardcore Software for the details). The list is literally endless. It happens so often in the Apple ecosystem it is called Sherlocking. The result effectively commoditizes a technology while maintaining a hold on distribution.
Second, you can compete by skipping the de-monetization step and going straight to commoditization. This approach is one that counts on the internet and gets distribution via the internet. Nearly everything running in the cloud today is built on this approach. It really starts with Linux but goes through everything from Apache to Git to Spark. The key with this approach and what is so unique about it is Opensource. Meta has done a fantastic job at being open source but is still relying on the architectural model that relies on 10’s of billions of dollars of CapEx. Now Meta, much like Google, could also justify that CapEx as a way of building tools to make their existing products better and open-source LLAMA is just a side effect and good for everyone. This is not unlike Google releasing all sorts of software from Chromium to Android. It is also what Google has specifically done to de-monetize Microsoft by offering Gmail, ChromeOS, and the suite of productivity tools (Google Docs was originally just free, presumably to de-monetize Office). They can do that because they monetize software with services on top of what they do with the Open Source they release. Their magic is that the value add on top of open source is not open source and is in their hyperscale data centers running their proprietary code using their proprietary data. By releasing all their products as open source they are essentially trying to commoditize AI. The challenge, however, is the cost. This is what happened with Hotmail for example—turns out at massive scale even a 5MB free mailbox adds up to a lot of subsidies.
That’s why we see all the early AI products from the companies spending billions taking one of two approaches: bundling or mostly open source. Those outside those two models are in a sense competing against bundles and against those companies trying to de-monetize the bundles. Those outside are caught in the middle.
The cost of AI, like the cost of mainframe computing to X.25 connectivity, literally **forces** the market to develop an alternative that scales without the massive direct capital.
By all accounts the latest approach with DeepSeek seems to be that. The internet is filled trying to analyze just how much cheaper, how much less data, or how many fewer people were involved. In algorithmic complexity terms, these are all constant factor differences. The fact that DeepSeek runs on commodity and disconnected hardware and is open source is enough of a shot across the bow of the current approach of AI hyper scaling that it can be seen as “the way things will go”.
I admit this is all confirmation bias for me. We’ve had a weekend with DeepSeek, and people are still pouring over it. The hyperscalers and Nvidia have massive technology roadmaps. I am not here for stock predictions at all. All I know for sure is that if history offers any advice to technologists, it is that core technologies become free / commodities and because of internet distribution and de facto market standardization at many layers that happens sooner with every turn of the crank.
China faced an AI situation not unlike Cisco. Many (including “The Short Case”) are looking at the Nvidia embargo as a driver. The details don’t really matter. It is just that they had different constraints. They had many more engineers to attack the problem than they had data centers to train. They were inevitably going to create a different kind of solution. In fact, I am certain someone somewhere would have. It is just, especially in hindsight, China was especially well-positioned.
Kai-Fu Lee argued today that this was because China would out-engineer the US. Nonsense I say. That is just trash talk. China took an obvious and clever approach that US companies were mostly blind to because of the path that got them to where they are today before AI. This is just a wakeup call.
I’m confident many in the US will see the course corrections that should happen. The next Cisco for AI is waiting to be created, I’m sure. If that doesn’t happen then it could also be like browsers ended up which is a big company (or three) will just bundle it for everyone to use. Either way, the commoditization step is upon is.
Get building. Scale out not up. 🚀
Check out Bittensor.com. Steve, you just wrote another justification for their decntralized approach. Great article!
Tbh I don't think it's out or up. They need quality data.