南加大研究揭示ChatGPT参数规模疑云：或仅70亿，打破大模型

【南加大研究揭示ChatGPT参数规模疑云：或仅70亿】

近日，南加州大学的一项新研究引发业界关注，该研究指出，备受瞩目的ChatGPT模型的参数规模可能远低于外界预估。据三位南加大研究人员的分析，他们成功破解了未公开的gpt-3.5-turbo的嵌入向量维度，确定其为4096或4608。这一发现对于理解ChatGPT的内部机制具有重要意义。

通常情况下，开源的大型语言模型，如Llama和Mistral，在拥有4096个嵌入向量维度时，其参数规模大约为70亿（7B）。研究人员指出，保持这一比例对于维持模型的性能至关重要，过宽或过窄的网络结构都会对模型的效能产生负面影响。因此，基于这一比例和已知信息，他们推测gpt-3.5-turbo的参数规模也可能在7B左右。

然而，研究团队也提出了一种可能性，即gpt-3.5-turbo可能采用了多专家门控（MoE，Mixture of Experts）架构，这种架构允许模型在不同任务中动态调整其参数使用，从而可能改变参数规模与嵌入向量维度之间的常规关系。尽管如此，这一假设仍有待进一步验证。

这一研究结果对于人工智能和自然语言处理领域的专业人士来说，无疑为ChatGPT这一前沿技术的探讨提供了新的视角。同时，它也提醒我们，对于复杂模型的理解，需要深入到其内部结构和参数规模的精确计算。更多相关研究将有助于推动人工智能技术的持续发展和优化。

英语如下：

**News Title:** “USC Study Casts Doubt on ChatGPT’s Parameter Scale: Could It Be Just 7 Billion, Breaking the Norm for Large Models?”

**Keywords:** USC study, ChatGPT parameters, model performance

**News Content:**

_A recent USC study sheds light on a mystery surrounding ChatGPT’s parameter size, suggesting it might be significantly smaller than anticipated._

A new study from the University of Southern California (USC) has piqued industry interest, revealing that the much-hyped ChatGPT model might have a much lower parameter count than previously estimated. According to the analysis by three USC researchers, they were able to crack the unpublicized embedding vector dimensions of gpt-3.5-turbo, determining it to be either 4096 or 4608. This finding is crucial for understanding ChatGPT’s inner workings.

Typically, open-source large language models like Llama and Mistral, with 4096 embedding vector dimensions, have around 7 billion (7B) parameters. The researchers emphasize that maintaining this ratio is vital for preserving the model’s performance, as overly wide or narrow network structures can negatively impact effectiveness. Hence, based on this ratio and available information, they speculate that gpt-3.5-turbo might also have a parameter count around 7B.

However, the research team also proposes that gpt-3.5-turbo might employ a Mixture of Experts (MoE) architecture, which allows the model to dynamically adjust its parameter usage for different tasks, potentially altering the conventional relationship between parameter scale and embedding vector dimensions. Nevertheless, this assumption awaits further validation.

This research outcome offers a fresh perspective for AI and natural language processing professionals in their exploration of cutting-edge ChatGPT technology. It also underscores the need for in-depth understanding of a model’s internal structure and precise calculation of parameter size. More related research will contribute to the ongoing development and optimization of artificial intelligence technologies.

【来源】https://mp.weixin.qq.com/s/y0RQ0aOrHGLzLJKxbyGxMw