One of the really cool things about this job is just that when something like this happens, I get to kind of talk to everyone and everyone wants to talk and I feel like I've talked to maybe not everyone and like all the top people in AI, but it feels like most of them. And there's definitely a lot of takes all over the map on DeepSeek, but I feel like I've started to put together a synthesis based on hearing from the top people in the field.
It was a bit of a freak out. It's rare that a model release is going to be a global news story or cause a trillion dollars of market cap to climb in one day. And so it is interesting to think about like why was this such a potent news story? And I think it's because there's two things about that company that are different. One is that obviously it's a Chinese company rather than an American company. And so you have the whole China versus US competition. And then the other is it's an open source company or at least it opensource the the R1 model.
And so you've kind of got the whole open source versus closed source debate. And if you take either one of those things out, it probably wouldn't have been such a big story. But I think the synthesis of these things got a lot of people's attention. A huge part of TikTok's audience, for example, is international. Some of them like the idea that the US may not win the AI race, that the US is kind of getting a come up in here. And I think that fueled some of the early attention on TikTok.
这段话大意是讨论开源和闭源的争论。如果没有其中任何一个因素,这个话题可能不会引起那么大的关注。但正是这些因素的结合引起了很多人的注意。例如,TikTok 的很大一部分用户来自国际市场。其中一些人喜欢看到美国在 AI 竞赛中没有胜出,觉得这算是美国的一种教训。我认为这也是 TikTok 早期获得关注的原因之一。
There's a lot of people who are rooting for open source or they have animosity towards open AI. And so they were kind of rooting for this idea that, oh, there's this open source model that's going to give away what open AI has done at one-twentyth the cost. So I think all of these things provided fuel for the story.
Now, I think the question is, okay, what should we make of this? I mean, I think there are things that are true about the story and then things that are not true or should be debunked. I think that let's call it true thing here is that if you had said to people a few weeks ago that the second company to release a reasoning model along lines of 01 would be a Chinese company, I think people would have been surprised by that.
So I think there was a surprise. And just to kind of back up for people, there's two major kinds of AI models now. There's kind of the base LML model like ChatT4O or the DeepSeq equivalent was V3, which they launched a month ago. And that's basically like a smart PhD. You ask a question, gives you an answer. Then there's the new reasoning models, which are based on reinforcement learning sort of a separate process as opposed to pre-training.
And 01 was the first model released along those lines. And you can think of a reasoning model as like a smart PhD who doesn't give you a snap answer, but actually goes off and does the work. You can give it a much more complicated question and it'll break that complicated problem into a subset of smaller problems. And then it'll go step by step to solve the problem.
And that's called chain of thought, right? And so the new generation of agents that are coming are based on this type of idea of chain of thought that an AI model can sequentially perform tasks, figure out much more complicated problems. So OpenAI was the first to release this type of reasoning model. Google has a similar model they're working on called Gemini 2.0 flash thinking. They've released kind of an early prototype of this called Deep Research 1.5. And Thoropik has something that I don't think they've released it yet.
这叫做“思维链”,对吗?新的代理模型就是基于这一思维链的理念。这种思维链让人工智能模型可以顺序地执行任务,从而解决更复杂的问题。因此,OpenAI是第一个发布这种推理模型的公司。谷歌也正在研发一个类似的模型,名为Gemini 2.0“闪思考”。他们已经发布了这个模型的一个早期原型,叫做Deep Research 1.5。Thoropik也有一个类似的产品,但我认为他们还没有发布。
So other companies have similar models to 01, either in the works or in some sort of private beta, but DeepSeq was really the next one after OpenAI to release the full public version of it. And moreover, they open sourced it. And so this created a pretty big splash. And I think it was legitimately surprising to people that the next big company to put out a reasoning model like this would be a Chinese company.
And moreover, that they would open source it, give it away for free. And I think the API access is something like one 20th the cost. So all of these things really did drive the news cycle. And I think for good reason, because I think that if you had asked most people in the industry a few weeks ago, how far behind is China on AI models, they would say six to 12 months. And now I think they might say something more like three to six months, right? Because 01 was released about four months ago, and R1 is comparable to that.
So I think it's definitely moved up people's timeframes for how close China is on AI. Now, let's take the, we should take the claim that they only do this for six million dollars. On this one, I'm with Palmer Lucky and Brad Gersoner and others. And I think this has been pretty much corroborated by everyone I've talked to that that number should be debunked.
So first of all, it's very hard to validate a claim about how much money went into the training of this model. It's not something that we can empirically discover. But even if you accepted it face value, that's six million dollars was for the final training run. So when the media is hyping up these stories saying that this Chinese company did it for six million, and these dumb American companies did it for a billion, it's not an apples to apples comparison, I mean, if you were to make the apples to apples comparison, you would need to compare the final training run cost by deep-seek to that of open AI or anthropic.
And what the founder of anthropic said, and what I think Brad has said, being an investor in open AI and having talked to them, is that the final training run cost was more in the tens of millions of dollars, about nine or 10 months ago. And so it's not six million versus a billion. Okay. It's a billion dollar number might include all the hardware they've bought the years of putting into it a holistic number as opposed to the training number.
Yeah, it's not fair to compare. Let's call it a soup to nuts number, a fully loaded number by American AI companies to the final training run by the Chinese company. But real quick, Sacks, you've got an open source model, and they've did the white paper they put out there is very specific about what they did to make it and sort of the results they got out of it. I don't think they give the training data, but you could start to stress test what they've already put out there and see if you can do it cheap, essentially.
Like I said, I think it is hard to validate the number. I think that if let's just assume that we give them credit for the six million number, my point is less that they couldn't have done it, but just that we need to be comparing likes to likes. Yeah. So if, for example, you're going to look at the fully loaded cost of what it took deep seek to get to this point, then you would need to look at what has been the R&D cost to date of all the models and all the experiments and all the training runs they've done, right? And the compute cluster that they surely have.
Dylan Patel, who's leading semiconductor analyst, has estimated that deep seek has about 50,000 hoppers. And specifically, you said they have about 10,000 H100s. They have 10,000 H 800s and 30,000 H 20s. Now the cost of a SAC, sorry, is they deep seek or it's deep seek plus the hedge fund? Deep C plus a hedge fund.
But it's the same founder, right? And by the way, that doesn't mean they did anything illegal, right? Because the H 100s were banned under export controls in 2022. Then they did the H 800s in 2023. But this founder was very far cited. He was very ahead of the curve and he was through his hedge fund. He was using AI to basically do algorithmic trading. So he bought these chips a while ago. In any event, you add up the cost of a compute cluster with 50,000 plus hoppers. And it's going to be over a billion dollars. So this idea that you've got this scrappy company that did it for only six million, just not true, they have a substantial compute cluster that they use to train their models. And frankly, that doesn't count any chips that they might have beyond the 50,000, you know, that they might have obtained in violation of export restrictions that obviously they're not going to admit to. And then we just don't know, we don't really know the full extent of what they have. So I just think it's like worth pointing that out that I think that part of the story got over hyped. It's hard to know what's fact and what's fiction. Everybody who's on the outside guessing has their own incentive, right? So if you're a semiconductor analyst that effectively is massively bullish in video, you want it to be true that it wasn't possible to train on six million dollars.
Obviously, if you're the person that makes an alternative that's that disruptive, you want it to be true that it was trained on six million dollars. All of that, I think is all speculation. The thing that struck me was how different their approach was and TK just mentioned this.
But if you dig into not just the original white paper of deep-seek, but they've also published some subsequent papers that have refined some of the details. I do think that this is a case in Sacks, you can tell me if you disagree, but this is a case where necessity was the mother of invention. So I'll give you two examples where I just read these things and I was like, man, these guys are like really clever. The first is, as you said, let's let's put an opinion on whether they distilled a one, which we can talk about in a second. But at the end of the day, these guys were like, well, how am I going to do this reinforcement learning thing? They invented a totally different algorithm. There was the the orthodoxy, right? This thing called PPO that everybody used and they were like, no, we're going to use something else called, I think it's called GRPO or something. It uses a lot less computer memory and it's highly performant. So maybe they work in strain Saks, practically speaking, by some amount of compute that caused them to find this, which you may not have found if you had just a total surplus of compute availability.
And then the second thing that was crazy is everybody is used to building models and compiling through CUDA, which is Nvidia's proprietary language, which I've said for a couple of times is their biggest moat, but it's also the biggest threat factor for lock in. And these guys worked totally around CUDA and they did something called PTX, which goes right to the bare metal and it's controllable and it's effectively like writing assembly. Now, the only reason I'm bringing these up is we, meaning the West, with all the money that we've had didn't come up with these ideas. And I think part of why we didn't come up is not that we're not smart enough to do it, but we weren't forced to because the constraints didn't exist. And so I just wonder how we make sure we learn this principle. Meaning, when the AI company wakes up and rolls out of bed and some VC gives them $200 million, maybe that's not the right answer for a series A or a seed. And maybe the right answer is $2 million so that they do these deep seek like innovations. And Shrant makes for great art. What do you think, Friedberg, when you're looking at this?
Well, I think it also enables a new class of investment opportunity. Given the low cost and the speed, it really highlights that maybe the opportunity to create value doesn't really sit at that level in the value chain, but further upstream. Bologie made a comment on Twitter today that was pretty funny or I think we're flaps this about the rapper. He's like turns out the rapper may be the the boat, the money, the mode, which is true at the end of the day. If model performance continues to improve, get cheaper, and it's so competitive that it commoditizes much faster than anyone even thought, then the value is going to be created somewhere else in the value chain. Maybe it's not the rapper, maybe it's with the user. And maybe by the way, here's an important point. Maybe it's further in the economy. You know, when electricity production took off in the United States, it's not like the companies are making a lot of money that are making all the electricity. It's the rest of the economy that accrues a lot of the.