中国深度求索(DeepSeek)公司表示,其热门人工智能模型的训练成本仅为 29.4 万美元。
2025-09-23 昨日重新 7719
正文翻译
BEIJING, Sept 18 (Reuters) - Chinese AI developer DeepSeek said it spent $294,000 on training its R1 model, much lower than figures reported for U.S. rivals, in a paper that is likely to reignite debate over Beijing's place in the race to develop artificial intelligence.
The rare upxe from the Hangzhou-based company - the first estimate it has released of R1's training costs - appeared in a peer-reviewed article in the academic journal Nature published on Wednesday.
DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia (NVDA.O), opens new tab.
Since then, the company and founder Liang Wenfeng have largely disappeared from public view, apart from pushing out a few new product upxes.
The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. A previous version of the article published in January did not contain this information.
Training costs for the large-language models powering AI chatbots refer to the expenses incurred from running a cluster of powerful chips for weeks or months to process vast amounts of text and code.
Sam Altman, CEO of U.S. AI giant OpenAI, said in 2023 that the training of foundational models had cost "much more" than $100 million - though his company has not given detailed figures for any of its releases.
Some of DeepSeek's statements about its development costs and the technology it used have been questioned by U.S. companies and officials.
The H800 chips it mentioned were designed by Nvidia for the Chinese market after the U.S. in October 2022 made it illegal for the company to export its more powerful H100 and A100 AI chips to China.
U.S. officials told Reuters in June that DeepSeek has access to "large volumes" of H100 chips that were procured after U.S. export controls were implemented. Nvidia told Reuters at the time that DeepSeek has used lawfully acquired H800 chips, not H100s.
In a supplementary information document accompanying the Nature article, the company acknowledged for the first time it does own A100 chips and said it had used them in preparatory stages of development.
"Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model," the researchers wrote. After this initial phase, R1 was trained for a total of 80 hours on the 512 chip cluster of H800 chips, they added.
Reuters has previously reported that one reason DeepSeek was able to attract the brightest minds in China was because it was one of the few domestic companies to operate an A100 supercomputing cluster.
MODEL DISTILLATION
DeepSeek also responded for the first time, though not directly, to assertions from a top White House adviser and other U.S. AI figures in January that it had deliberately "distilled" OpenAI's models into its own.

北京,9 月 18 日(路透社)—— 中国人工智能开发商深度求索(DeepSeek)表示,其 R1 模型的训练成本为 29.4 万美元,远低于美国竞争对手公布的数字。该数据出自一篇论文,这篇论文可能会重新引发关于中国在人工智能发展竞赛中地位的讨论。

这家总部位于杭州的公司此前鲜有此类动态发布,此次公布的 R1 模型训练成本也是其首次披露。该数据见于周三发表在学术期刊《自然》上的一篇同行评审文章中。
今年 1 月,深度求索曾发布据称成本更低的人工智能系统。消息一出,全球投资者纷纷抛售科技股,因为他们担心这些新模型可能会威胁到英伟达等人工智能领军企业的主导地位。

自那以后,除发布少数几款新产品更新外,深度求索及其创始人梁文峰基本淡出了公众视野。
这篇《自然》论文将梁文峰列为合著者之一,文中提到,深度求索这款主打推理功能的 R1 模型,训练成本为 29.4 万美元,训练过程使用了 512 块英伟达 H800 芯片。而该论文在今年 1 月首次发表的版本中并未包含这些信息。

支撑人工智能聊天机器人运行的大型语言模型,其训练成本指的是:运行一组高性能芯片数周或数月,以处理海量文本和代码所产生的费用。
美国人工智能巨头 OpenAI 的首席执行官山姆・奥特曼曾在 2023 年表示,基础模型的训练成本 “远高于” 1 亿美元,不过该公司尚未公布任何一款模型的详细成本数据。
深度求索关于其开发成本及所用技术的部分说法,已受到美国企业和官员的质疑。

该公司提到的 H800 芯片,是英伟达专为中国市场设计的产品。此前在 2022 年 10 月,美国出台规定,禁止英伟达向中国出口性能更强的 H100 和 A100 两款人工智能芯片,H800 芯片由此应运而生。
今年 6 月,美国官员向路透社透露,深度求索能够获取 “大量” H100 芯片,且这些芯片是在美国实施出口管制后采购的。英伟达当时则对路透社表示,深度求索使用的是合法采购的 H800 芯片,而非 H100 芯片。

在《自然》论文附带的补充信息文件中,深度求索首次承认该公司确实拥有 A100 芯片,并表示这些芯片用于研发的准备阶段。
研究人员在文件中写道:“在 DeepSeek-R1 的研究过程中,我们使用 A100 GPU 为小型模型的实验做准备。” 他们补充称,在这一初始阶段之后,研发团队使用由 512 块 H800 芯片组成的集群,对 R1 模型进行了总计 80 小时的训练。

路透社此前曾报道,深度求索之所以能吸引中国顶尖人才,原因之一在于它是国内少数拥有 A100 超级计算集群并投入运营的企业。
模型提炼
今年 1 月,美国白宫一位高级顾问及其他美国人工智能领域人士曾声称,深度求索蓄意将 OpenAI 的模型 “提炼” 到自家模型中。此次,深度求索虽未直接回应,但也首次对此类说法作出了回应。
评论翻译
BEIJING, Sept 18 (Reuters) - Chinese AI developer DeepSeek said it spent $294,000 on training its R1 model, much lower than figures reported for U.S. rivals, in a paper that is likely to reignite debate over Beijing's place in the race to develop artificial intelligence.
The rare upxe from the Hangzhou-based company - the first estimate it has released of R1's training costs - appeared in a peer-reviewed article in the academic journal Nature published on Wednesday.
DeepSeek's release of what it said were lower-cost AI systems in January prompted global investors to dump tech stocks as they worried the new models could threaten the dominance of AI leaders including Nvidia (NVDA.O), opens new tab.
Since then, the company and founder Liang Wenfeng have largely disappeared from public view, apart from pushing out a few new product upxes.
The Nature article, which listed Liang as one of the co-authors, said DeepSeek's reasoning-focused R1 model cost $294,000 to train and used 512 Nvidia H800 chips. A previous version of the article published in January did not contain this information.
Training costs for the large-language models powering AI chatbots refer to the expenses incurred from running a cluster of powerful chips for weeks or months to process vast amounts of text and code.
Sam Altman, CEO of U.S. AI giant OpenAI, said in 2023 that the training of foundational models had cost "much more" than $100 million - though his company has not given detailed figures for any of its releases.
Some of DeepSeek's statements about its development costs and the technology it used have been questioned by U.S. companies and officials.
The H800 chips it mentioned were designed by Nvidia for the Chinese market after the U.S. in October 2022 made it illegal for the company to export its more powerful H100 and A100 AI chips to China.
U.S. officials told Reuters in June that DeepSeek has access to "large volumes" of H100 chips that were procured after U.S. export controls were implemented. Nvidia told Reuters at the time that DeepSeek has used lawfully acquired H800 chips, not H100s.
In a supplementary information document accompanying the Nature article, the company acknowledged for the first time it does own A100 chips and said it had used them in preparatory stages of development.
"Regarding our research on DeepSeek-R1, we utilized the A100 GPUs to prepare for the experiments with a smaller model," the researchers wrote. After this initial phase, R1 was trained for a total of 80 hours on the 512 chip cluster of H800 chips, they added.
Reuters has previously reported that one reason DeepSeek was able to attract the brightest minds in China was because it was one of the few domestic companies to operate an A100 supercomputing cluster.
MODEL DISTILLATION
DeepSeek also responded for the first time, though not directly, to assertions from a top White House adviser and other U.S. AI figures in January that it had deliberately "distilled" OpenAI's models into its own.

北京,9 月 18 日(路透社)—— 中国人工智能开发商深度求索(DeepSeek)表示,其 R1 模型的训练成本为 29.4 万美元,远低于美国竞争对手公布的数字。该数据出自一篇论文,这篇论文可能会重新引发关于中国在人工智能发展竞赛中地位的讨论。

这家总部位于杭州的公司此前鲜有此类动态发布,此次公布的 R1 模型训练成本也是其首次披露。该数据见于周三发表在学术期刊《自然》上的一篇同行评审文章中。
今年 1 月,深度求索曾发布据称成本更低的人工智能系统。消息一出,全球投资者纷纷抛售科技股,因为他们担心这些新模型可能会威胁到英伟达等人工智能领军企业的主导地位。

自那以后,除发布少数几款新产品更新外,深度求索及其创始人梁文峰基本淡出了公众视野。
这篇《自然》论文将梁文峰列为合著者之一,文中提到,深度求索这款主打推理功能的 R1 模型,训练成本为 29.4 万美元,训练过程使用了 512 块英伟达 H800 芯片。而该论文在今年 1 月首次发表的版本中并未包含这些信息。

支撑人工智能聊天机器人运行的大型语言模型,其训练成本指的是:运行一组高性能芯片数周或数月,以处理海量文本和代码所产生的费用。
美国人工智能巨头 OpenAI 的首席执行官山姆・奥特曼曾在 2023 年表示,基础模型的训练成本 “远高于” 1 亿美元,不过该公司尚未公布任何一款模型的详细成本数据。
深度求索关于其开发成本及所用技术的部分说法,已受到美国企业和官员的质疑。

该公司提到的 H800 芯片,是英伟达专为中国市场设计的产品。此前在 2022 年 10 月,美国出台规定,禁止英伟达向中国出口性能更强的 H100 和 A100 两款人工智能芯片,H800 芯片由此应运而生。
今年 6 月,美国官员向路透社透露,深度求索能够获取 “大量” H100 芯片,且这些芯片是在美国实施出口管制后采购的。英伟达当时则对路透社表示,深度求索使用的是合法采购的 H800 芯片,而非 H100 芯片。

在《自然》论文附带的补充信息文件中,深度求索首次承认该公司确实拥有 A100 芯片,并表示这些芯片用于研发的准备阶段。
研究人员在文件中写道:“在 DeepSeek-R1 的研究过程中,我们使用 A100 GPU 为小型模型的实验做准备。” 他们补充称,在这一初始阶段之后,研发团队使用由 512 块 H800 芯片组成的集群,对 R1 模型进行了总计 80 小时的训练。

路透社此前曾报道,深度求索之所以能吸引中国顶尖人才,原因之一在于它是国内少数拥有 A100 超级计算集群并投入运营的企业。
模型提炼
今年 1 月,美国白宫一位高级顾问及其他美国人工智能领域人士曾声称,深度求索蓄意将 OpenAI 的模型 “提炼” 到自家模型中。此次,深度求索虽未直接回应,但也首次对此类说法作出了回应。
很赞 17
收藏