All-In Podcast - AI Czar David Sacks Explains the DeepSeek Freak Out
发布时间:2025-02-02 19:30:03
原节目
这段视频讨论了最近围绕中国人工智能公司 DeepSeek 及其 R1 模型的热议。演讲者首先指出,DeepSeek 受到了非同寻常的关注,演变成一个全球新闻事件,并影响了市值,这归因于两个关键因素:DeepSeek 是一家与美国竞争的中国公司,以及其开源方法与 OpenAI 等公司的闭源模式形成对比。这些因素与不同的群体产生了共鸣,包括那些支持国际竞争和开源倡议的人。
讨论深入探讨了这段故事中哪些方面是真实的,哪些方面需要辟谣。它承认了一个令人惊讶的事实,即第二家发布可与 OpenAI 的 01 模型相媲美的推理模型的公司是一家中国公司。演讲者解释了基础语言模型(如 GPT-4O 或 DeepSeek 的 V3)与更新的推理模型之间的区别,后者使用强化学习和思维链来逐步解决复杂问题。虽然 OpenAI 是第一个发布推理模型的公司,但 DeepSeek 是下一个,并且值得注意的是,他们将其开源,使其能够以显著更低的成本访问。这一进展加速了人们对中国在人工智能领域进步的看法,预计的差距从 6-12 个月缩短到 3-6 个月。
讨论随后涉及了广泛流传的关于 DeepSeek 仅花费 600 万美元开发 R1 模型的说法。演讲者与 Palmer Lucky 和 Brad Gerstner 的观点一致,认为这个数字具有误导性,应该被辟谣。即使 600 万美元代表最终训练运行的成本,它也没有考虑到更广泛的研发投资。将此与美国人工智能公司 "从头到尾" 或完全加载的成本进行比较是不公平的。虽然验证确切的训练成本很困难,但至关重要的是进行同类比较。
半导体分析师 Dylan Patel 估计,DeepSeek 拥有一个庞大的计算集群,大约有 50,000 个 GPU,包括 H100、H800 和 H20 芯片,可能是通过其创始人的对冲基金活动获得的。这样一个集群的成本将超过 10 亿美元,这与 “白手起家的公司” 的说法相矛盾。虽然承认由于既得利益而难以确定准确的信息,但演讲者强调了 DeepSeek 方法中值得注意的差异。
对话强调了 DeepSeek 采用的创新算法和方法。他们被迫发明了一种全新的强化学习算法,称为 GRPO,它使用的计算机内存更少,并且具有很高的性能。他们没有依赖英伟达的专有语言 CUDA,而是使用 PTX 直接连接到裸机,这可以被控制。这种创造性,可能是由资源限制驱动的,使他们能够开发出西方资金充足的公司没有追求的解决方案。视频思考了唾手可得的大额融资是否阻碍了源于必要的创新。
Friedberg 补充说,这种转变突显了新的投资机会。他引用了 Balaji Srinivasan 关于 “说唱歌手”(用户)是价值链中新的护城河的评论。如果模型性能持续提高,价值创造将进一步向上或向下游移动。它提出了一个观点,即虽然创建模型的公司可能不会变得富有,但价值将在价值链的其他地方找到。
This video discusses the recent buzz surrounding DeepSeek, a Chinese AI company, and its R1 model. The speaker begins by noting the unusual level of attention DeepSeek has garnered, turning into a global news story and influencing market capitalization, attributing it to two key factors: DeepSeek being a Chinese company competing with the US and its open-source approach contrasting with the closed-source model of companies like OpenAI. These factors resonated with diverse groups, including those who support international competition and open-source initiatives.
The conversation delves into what aspects of this story hold truth and what needs debunking. It acknowledges the surprising fact that the second company to release a reasoning model comparable to OpenAI's 01 is a Chinese company. The speaker explains the difference between base language models (like GPT-4O or DeepSeek's V3) and the newer reasoning models, which use reinforcement learning and chain-of-thought to solve complex problems step-by-step. While OpenAI was the first to release a reasoning model, DeepSeek was the next, and notably, they open-sourced it, making it accessible at a significantly lower cost. This development has accelerated the perception of China's progress in AI, with estimates shifting from a 6-12 month lag to a 3-6 month gap.
The discussion then addresses the widely circulated claim that DeepSeek developed the R1 model for only $6 million. The speaker, aligned with views from Palmer Lucky and Brad Gerstner, argues that this figure is misleading and should be debunked. Even if the $6 million represents the cost of the final training run, it doesn't account for the broader research and development investment required. Comparing this to the "soup to nuts" or fully loaded costs of US AI companies is unfair. While validating the exact training cost is difficult, it's crucial to compare like with like.
Dylan Patel, a semiconductor analyst, estimates that DeepSeek possesses a substantial compute cluster of approximately 50,000 GPUs, including H100, H800, and H20 chips, potentially acquired through their founder's hedge fund activities. The cost of such a cluster would exceed a billion dollars, contradicting the "scrappy company" narrative. While acknowledging the difficulty in ascertaining accurate information due to vested interests, the speaker highlights the differences in DeepSeek's approach that are noteworthy.
The conversation emphasizes the innovative algorithms and methodologies employed by DeepSeek. They were forced to invent a totally new reinforcement learning algorithm, called GRPO, which uses a lot less computer memory and is highly performant. Rather than rely on CUDA, Nvidia's proprietary language, they worked around it using PTX to go straight to bare metal, which can be controlled. This inventiveness, likely driven by resource constraints, allowed them to develop solutions that Western companies, flush with capital, had not pursued. The video ponders whether readily available large funding rounds hinder the innovation that arises from necessity.
Friedberg contributes by suggesting this shift highlights new investment opportunities. He cites Balaji Srinivasan's comment about "the rapper" (the user) being the new moat in the value chain. If model performance keeps improving, the creation of value will be further up or down the stream. It brings up the point that while the companies creating the models may not get rich, the value will be found somewhere else in the value chain.