Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

0

彩云科技发布基于DCFormer架构的通用大模型“云锦天章”,引领Transformer架构升级

2024年11月13日 – 近日,国内人工智能企业彩云科技在国际机器学习领域的顶级会议ICML(国际机器学习大会)上,发布了基于全新DCFormer架构的通用大模型“云锦天章”。这是业内首次基于DCFormer架构发布的大模型,标志着Transformer架构的又一次重要升级。

Transformer架构的演进与突破

2017年,谷歌发布的《Attention Is All You Need》论文首次提出了Transformer架构,彻底改变了人工智能自然语言处理(NLP)领域的发展方向。Transformer架构作为神经网络学习中最重要的架构,成为了后来席卷全球的一系列通用大模型如ChatGPT、Gemini的底层技术支撑。

近年来,提升Transformer的运行效率成为人工智能领域的研究热点。今年4月,谷歌更新了Transformer架构,提出了Mixture-of-Depths(MoD)方法,在训练后采样过程中提速50%,成为Transformer架构提速升级的又一重要事件。

彩云科技的创新突破:DCFormer架构

在ICML大会上,彩云科技团队发布了全新大模型论文《Improving Transformers with Dynamically Composable Multi-Head Attention》,首次提出了DCFormer架构。该架构通过动态组合多头注意力机制,实现了Transformer架构的显著优化,在预训练困惑度和下游任务评估上都优于开源Pythia模型。

“云锦天章”:基于DCFormer架构的强大模型

彩云科技基于DCFormer架构打造的模型DCPythia-6.9B,被命名为“云锦天章”。该模型在多个方面展现出优异性能:

  • 更高的效率: DCFormer架构显著提升了Transformer的运行效率,使得模型训练和推理速度更快。
  • 更强的性能: 在预训练困惑度和下游任务评估上,DCPythia-6.9B都优于开源Pythia模型。
  • 更广泛的应用: “云锦天章”可以应用于多种NLP任务,包括文本生成、机器翻译、问答系统等。

未来展望:DCFormer架构的应用前景

彩云科技的DCFormer架构为Transformer架构的升级提供了新的思路,也为人工智能领域的发展带来了新的机遇。未来,DCFormer架构有望在更多领域得到应用,推动人工智能技术的发展和应用。

参考文献

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

作者: 机器之心编辑团队

编辑: 张倩

联系方式: [email protected]

版权声明: 本文为机器之心原创,转载请联系机器之心获得授权。


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注