8B模型100次搜索超越GPT-40，AI性能突破引关注

基于您提供的信息，我们可以得出以下几点结论：

1. **搜索的重要性**：最新的研究指出，LLM（大型语言模型）可以通过搜索来扩展其性能，特别是在推理阶段。这表明搜索不仅是学习过程的一部分，也是提高模型性能的关键手段。

2. **小型模型潜力**：论文提到，即使参数量只有8B的Llama 3.1模型，通过100次的搜索，在Python代码生成任务上也能够达到与GPT-4o相当的水平。这表明小型模型在特定任务上仍然具有竞争力，尤其是在配合搜索技术的情况下。

3. **Scaling Law的启示**：Rich Sutton在2019年发表的论文《The Bitter Lesson》强调了通用方法的力量，并指出随着算力的增加，学习方法和搜索方法可以持续扩展。这表明小型模型并非无关紧要，它们也有潜力通过搜索等技术实现性能的提升。

4. **搜索与评估的关系**：论文指出，高质量的评估对于搜索方法的实现至关重要。DeepMind等机构在数学领域的成果表明，将自然语言表述的数学问题翻译为形式化表述，可以提高自动化程度和并行程度。

5. **未来研究方向**：在AI数学领域的应用中，形式化方法的使用表明，搜索和评估在特定领域中的应用潜力很大。因此，未来研究可能会集中在如何将搜索方法应用于其他领域，特别是那些难以进行有效搜索的任务上。

6. **实验的可复现性**：两位工程师的实验表明，通过在推理阶段使用100个小型Llama模型进行搜索，可以在Python编程任务中打败GPT-4o。这强调了实验的可复现性，并且提供了源代码，使得其他研究者可以验证和扩展这些结果。

7. **成本效益**：实验的复现成本低，这表明这种方法在经济上也是可行的，并且可能适用于更多的研究和商业应用。

综上所述，搜索技术在LLM的推理阶段提供了显著的性能提升，小型模型通过搜索技术也能够表现出竞争力。未来，随着技术的进步，搜索方法可能会在更多的领域得到应用，特别是在评估和形式化方法得到进一步发展的背景下。

英语如下：

News Title: “8B Model Outperforms GPT-40 in 100 Searches, AI Performance Breakthrough Sparks Attention”

Keywords: Llama 8B, Search Extension, Performance Enhancement

News Content: Based on the information provided, we can draw the following conclusions:

1. **The Importance of Search**: Recent research indicates that LLM (Large Language Models) can expand their performance through search, particularly in the inference phase. This suggests that search is not only a part of the learning process but also a critical means to enhance model performance.

2. **Potential of Smaller Models**: The paper mentions that the Llama 3.1 model, with only 8B parameters, can achieve a level comparable to GPT-40 in Python code generation tasks after 100 searches. This indicates that smaller models can remain competitive in specific tasks, especially when coupled with search technology.

3. **Scaling Law Insights**: Rich Sutton’s 2019 paper, “The Bitter Lesson,” emphasizes the power of general methods and points out that with increased compute, learning and search methods can continue to scale. This suggests that smaller models are not insignificant, and they have the potential to improve performance through search and other techniques.

4. **Search and Evaluation Relationship**: The paper highlights the importance of high-quality evaluation for the implementation of search methods. The achievements of DeepMind in the field of mathematics indicate that translating natural language mathematical problems into formal representations can enhance automation and parallelization.

5. **Future Research Directions**: In the application of AI in mathematics, the use of formal methods suggests a significant potential for the application of search and evaluation in specific domains. Therefore, future research may focus on how to apply search methods to other fields, especially those tasks that are difficult to effectively search.

6. **Replicability of Experiments**: The experiments conducted by two engineers show that by using 100 smaller Llama models for search in the inference phase, they were able to outperform GPT-40 in Python programming tasks. This emphasizes the replicability of the experiments and provides source code that allows other researchers to verify and extend these results.

7. **Cost-Effectiveness**: The low cost of replicating the experiments indicates that this method is economically viable and may be applicable to more research and commercial applications.

In summary, search technology provides significant performance enhancements in the inference phase of LLM, and smaller models can exhibit competitiveness through search technology. In the future, as technology advances, search methods may be applied in more fields, especially in the context of further development in evaluation and formal methods.

【来源】https://www.ithome.com/0/788/830.htm