南加大研究揭示ChatGPT参数规模疑云：或仅70亿，颠覆业界认

【南加州大学研究揭示ChatGPT参数规模疑云：或仅70亿】

据量子位报道，近日，南加州大学的一个研究团队对OpenAI的GPT-3.5-Turbo模型进行了深入剖析，揭示了其可能的参数规模。研究指出，GPT-3.5-Turbo的嵌入向量维度为4096或4608，这一数据与开源大模型如Llama和Mistral在相同维度下的参数规模——约70亿（7B）参数相吻合。

研究人员解释，通常情况下，当模型的嵌入向量维度为4096时，其参数规模大约为7B。若偏离这一比例，模型可能会面临网络过宽或过窄的问题，这对模型的性能表现是不利的。因此，基于现有的数据和理论模型，他们推测GPT-3.5-Turbo的参数规模很可能也在7B左右。

然而，研究团队也提出，如果GPT-3.5-Turbo采用了MoE（Mixture of Experts）架构，那么参数规模可能会有所不同。MoE是一种特殊的网络架构，能够在保持性能的同时，通过动态路由机制有效管理大量参数，这为参数规模的估计增加了不确定性。

这一发现为理解ChatGPT的性能和设计提供了新的视角，同时也引发了业界对模型规模与性能之间平衡点的深入讨论。南加州大学的研究成果为未来大模型的优化和开发提供了宝贵的参考依据，有望推动人工智能技术的进一步发展。

英语如下：

News Title: “USC Study Uncovers ChatGPT Parameter Size Mystery: Could Be Just 7 Billion, Challenging Industry Perception”

Keywords: USC study, GPT parameter scale, model performance

News Content: **USC Research Reveals ChatGPT Parameter Size Enigma: Potentially Only 7 Billion**

According to QbitAI, a research team from the University of Southern California (USC) has recently conducted an in-depth analysis of OpenAI’s GPT-3.5-Turbo model, shedding light on its possible parameter count. The study suggests that the embedding vector dimension of GPT-3.5-Turbo is either 4096 or 4608, aligning with the parameter scale of around 7 billion (7B) parameters in comparable open-source large models like Llama and Mistral for the same dimension.

Researchers explained that typically, when a model’s embedding vector dimension is 4096, its parameter size would be roughly 7B. Deviations from this ratio might lead to issues of network over- or under-width, negatively impacting the model’s performance. Hence, based on current data and theoretical models, they speculate that GPT-3.5-Turbo’s parameter count could also be around 7B.

However, the research team also posits that if GPT-3.5-Turbo employs an MoE (Mixture of Experts) architecture, the parameter size could vary. MoE is a specialized network structure that allows efficient management of a large number of parameters while maintaining performance through dynamic routing mechanisms, introducing uncertainty into the estimation of parameter size.

This discovery offers a new perspective on understanding ChatGPT’s performance and design and prompts a deeper industry dialogue on the balance between model scale and performance. The USC findings provide valuable reference for the optimization and development of future large models, potentially propelling the advancement of artificial intelligence technology.

【来源】https://mp.weixin.qq.com/s/y0RQ0aOrHGLzLJKxbyGxMw