【南加大研究揭示ChatGPT参数规模新发现】据量子位报道,美国南加州大学的一项最新研究表明,备受瞩目的ChatGPT参数规模可能仅为70亿。这一发现是由该校三位研究人员通过对未公开的gpt-3.5-turbo模型的嵌入向量维度分析得出的。他们指出,gpt-3.5-turbo的嵌入向量维度可能为4096或4608,与大多数已知的开源大模型,如Llama和Mistral,在相同维度下通常拥有约70亿(7B)参数规模的情况相吻合。
研究团队强调,模型的参数规模与性能之间存在微妙的平衡。当嵌入向量维度固定在4096时,如果参数规模过大或过小,会导致网络过宽或过窄,从而影响模型的性能。因此,基于这一理论,他们推测gpt-3.5-turbo的参数规模很可能也在7B左右。然而,他们也提出了一种可能性,即如果采用了MoE(Mixture of Experts)架构,参数规模可能会有所不同,因为MoE架构允许在不增加单个模型的参数量的情况下提高模型的复杂性。
这一研究为理解大型语言模型的内部工作机制提供了新的视角,同时也引发了业界对于ChatGPT规模与效能之间关系的深入讨论。尽管OpenAI官方尚未对这一研究结果发表评论,但这一发现无疑为AI领域的研究者们提供了宝贵的参考信息。
英语如下:
**News Title:** “USC Study Suggests ChatGPT’s Parameter Count May Be Just 7 Billion, Breaking the Large Model Mold?”
**Keywords:** USC study, ChatGPT parameters, model performance
**News Content:** **USC Research Reveals New Insights into ChatGPT’s Parameter Scale** According to QbitAI, a recent study from the University of Southern California (USC) indicates that the much-discussed ChatGPT might have a parameter count as low as 7 billion. This conclusion was reached by three researchers at the university who analyzed the embedding vector dimensions of the unreleased gpt-3.5-turbo model.
They posit that the embedding vector dimensions of gpt-3.5-turbo could be 4096 or 4608, a figure consistent with the typical parameter scale of around 7 billion (7B) for known open-source large models, like Llama and Mistral, operating at similar dimensions.
The research team underscores the delicate balance between a model’s parameter scale and its performance. With the embedding vector dimension fixed at 4096, they argue that an overly large or small parameter count could lead to a network that is either too wide or too narrow, negatively impacting the model’s effectiveness. As a result, they speculate that gpt-3.5-turbo’s parameter count is likely around 7B. However, they also propose that if the model employs an MoE (Mixture of Experts) architecture, the parameter scale could differ, as MoE allows for increased model complexity without escalating the parameter count of a single model.
This study offers a fresh perspective on the inner workings of large language models and has sparked in-depth industry discussions on the relationship between ChatGPT’s scale and efficiency. While OpenAI has not yet commented on the study’s findings, the revelation furnishes valuable reference for AI researchers in the field.
【来源】https://mp.weixin.qq.com/s/y0RQ0aOrHGLzLJKxbyGxMw
Views: 1