【南加大研究揭示ChatGPT参数规模疑云:或仅70亿,挑战大模型定律?】
洛杉矶讯,近期,南加州大学一个研究团队取得了一项引人关注的发现,他们推测OpenAI的热门模型ChatGPT的参数规模可能远小于外界预想,仅有约70亿。这一数字来源于团队对未公开的gpt-3.5-turbo模型嵌入向量维度的破解,他们确定该维度为4096或4608。
通常情况下,开源大模型如Llama和Mistral在拥有4096个嵌入向量维度时,其参数规模约为70亿。然而,南加大团队指出,如果ChatGPT的参数规模与此相同,但其性能仍能保持高水平,那么这将打破一个已知的规律——网络过宽或过窄都会对模型性能产生负面影响。他们提出,除非ChatGPT采用了不同的架构,如专家门(MoE,Mixture of Experts)设计,否则这一现象将难以解释。
该研究的三位作者在分析中提到,MoE架构允许模型在保持高效的同时,通过动态路由机制处理大规模参数,这可能是ChatGPT性能优异而参数规模较小的一个可能解释。这一发现对人工智能领域的模型设计和优化带来了新的思考,未来可能推动更高效、更精简的AI模型的开发。
这一研究成果已由专业科技媒体《量子位》报道,引发了业界对于ChatGPT内部机制的热烈讨论,同时也对AI模型的规模与性能之间的关系提出了新的挑战。
英语如下:
**News Title:** “USC Study Uncovers ChatGPT Parameter Mystery: Just 7 Billion, Defying Large Model Norm?”
**Keywords:** USC study, GPT parameters, model performance
**News Content:**
**LOS ANGELES** — A research team from the University of Southern California (USC) has made a noteworthy revelation, suggesting that OpenAI’s popular ChatGPT model might have significantly fewer parameters than initially assumed, approximately 7 billion. This estimation stems from the team’s analysis of the unpublicized gpt-3.5-turbo model’s embedding vector dimensions, which they determined to be either 4096 or 4608.
Typically, open-source large models like Llama and Mistral, with 4096 embedding vector dimensions, have around 70 billion parameters. However, the USC researchers point out that if ChatGPT has a similar scale yet maintains high performance, it would defy the established rule that overly wide or narrow networks can negatively impact model effectiveness. They propose that unless ChatGPT employs a different architecture, such as the Mixture of Experts (MoE) design, this phenomenon would be challenging to explain.
The three authors of the study noted in their analysis that MoE architecture enables models to handle large parameters efficiently through dynamic routing mechanisms, potentially accounting for ChatGPT’s superior performance with a smaller parameter count. This finding introduces new considerations for model design and optimization in the AI domain and could potentially drive the development of more efficient, streamlined AI models in the future.
The research findings have been reported by the professional tech media outlet _Quantum Bit_, sparking heated discussions within the industry about ChatGPT’s inner workings and posing fresh challenges to the relationship between AI model scale and performance.
【来源】https://mp.weixin.qq.com/s/y0RQ0aOrHGLzLJKxbyGxMw
Views: 1