大型语言模型训练：从头开始已无意义？

##“从头开始训练模型，几乎没有意义”：专家建议AI企业聚焦系统而非模型

**InfoQ** 近日发表文章，指出在人工智能领域，**“从头开始训练模型，几乎没有意义”**，并建议企业将资源集中在构建系统而非模型上。文章作者 Eugene Yan 等人认为，许多企业在没有清晰的产品愿景或目标市场的情况下，盲目投入巨资进行模型训练，这是一种错误的做法。

文章指出，对于大多数组织来说，从头开始训练大模型是一种不切实际的分心，它分散了构建实际产品的精力和资源。开发和维护机器学习基础设施需要大量的资源投入，包括收集数据、训练和评估模型以及部署它们。而预训练的大模型可能在几个月内就变得过时，例如，专门为金融任务训练的 BloombergGPT 在一年内就被 gpt-3.5-turbo 和 gpt-4 超越。

文章建议，企业应该考虑对可能满足他们特定需求的最强大的开源模型进行微调。只有在收集了大量示例并确信其他方法均不足以解决问题时，才考虑进行微调。

文章还强调了“模型不是产品”的概念，并指出，企业应该专注于构建健全的系统，并提供令人难忘、有粘性的体验。文章建议企业将资源投入到以下方面：

* **构建支持和增强人类能力的 AI 工具，而不是试图完全取代人类。**
* **专注于提示词工程、评估和数据收集等方面，以构建可靠、可扩展的产品。**
* **持续改进，缩小原型和生产之间的差距。**

文章最后指出，大模型的成本迅速降低和能力增加将塑造 AI 应用的未来。企业需要审视历史趋势，并通过一个简单的方法来估计某些应用何时可能在经济上变得可行。

该文章的观点引起了业界广泛关注，许多专家表示认同。他们认为，在当前 AI 发展阶段，企业应该更加理性地看待模型训练，将资源集中在构建系统和产品上，才能真正实现 AI 的价值。

英语如下：

##Training Large Language Models from Scratch: Is It Pointless?

**Keywords:** Model Training, Limited Value, Pretraining

**News Content:**

##“Training models from scratch is almost meaningless”: Experts advise AI companies to focus on systems, not models.

**InfoQ** recently published an article stating thatin the field of artificial intelligence, **”training models from scratch is almost meaningless”**, and recommending that companies focus their resources on building systems rather than models.The article’s authors, including Eugene Yan, argue that many companies are making a mistake by blindly investing heavily in model training without a clear product vision or target market.

The article points out that for most organizations, training large models fromscratch is an impractical distraction that diverts energy and resources from building actual products. Developing and maintaining machine learning infrastructure requires significant resource investment, including collecting data, training and evaluating models, and deploying them. Pretrained large models can become outdated withinmonths. For example, BloombergGPT, specifically trained for financial tasks, was surpassed by gpt-3.5-turbo and gpt-4 within a year.

The article suggests that companies should consider fine-tuning the most powerful open-source models that might meet their specific needs. Fine-tuning should onlybe considered after collecting a large number of examples and being certain that other methods are insufficient to solve the problem.

The article also emphasizes the concept of “models are not products” and states that companies should focus on building robust systems and delivering memorable, sticky experiences. The article recommends that companies invest resources in the following areas:

* **Building AI tools that support and enhance human capabilities, rather than attempting to completely replace humans.**
* **Focusing on areas like prompt engineering, evaluation, and data collection to build reliable, scalable products.**
* **Continuously improving and bridging the gap between prototyping and production.**

The article concludes bystating that the rapidly decreasing cost and increasing capabilities of large models will shape the future of AI applications. Companies need to examine historical trends and use a simple method to estimate when certain applications might become economically viable.

The article’s perspective has garnered widespread attention in the industry, with many experts expressing agreement. They believe thatat the current stage of AI development, companies should take a more rational approach to model training, focusing resources on building systems and products to truly realize the value of AI.

【来源】https://mp.weixin.qq.com/s/Qg-ssbMnvCQ4Hw_LPGUySg