出门问问开源“序列猴子”数据集

作者智能小编

2 月 24, 2024 #开源, #文本语料, #每日AI快讯, #语言模型

近日，出门问问宣布将其超大规模语言模型“序列猴子”的部分训练数据集向公众开放，这一举措标志着人工智能领域又一重要资源的共享。此次开放的数据集命名为“序列猴子开源数据集1.0”，包含了中文通用文本语料、古诗今译语料以及文本生成语料，旨在促进自然语言处理领域的研究与创新。

出门问问作为一家专注于人工智能技术的公司，其“序列猴子”模型在自然语言处理领域具有显著的影响力。此次开源的数据集不仅为研究人员提供了一个宝贵的研究资源，也为开发者提供了一个学习和实践的平台。通过这些语料，研究人员可以更好地理解中文语言结构，开发者则可以利用这些数据训练自己的模型，推动人工智能技术的发展。

“序列猴子开源数据集1.0”的发布，不仅体现了出门问问对于推动人工智能技术发展的承诺，也展示了该公司在数据共享和开放创新方面的积极态度。随着数据集的广泛应用，预计将对自然语言处理领域的研究产生深远的影响。

英文标题：Open-Source “Sequence Monkey” Dataset by Mobvoi
英文关键词：Natural Language Processing, Open Source, Dataset
英文新闻内容：
Mobvoi, a leading artificial intelligence company, has announced the release of a portion of its large-scale language model “Sequence Monkey” as an open-source dataset. Known as “Sequence Monkey Open Dataset 1.0,” the dataset includes Chinese general text corpus, ancient poetry modern translation corpus, and text generation corpus. This initiative marks a significant contribution to the sharing of resources in the field of natural language processing. The release of this dataset not only provides a valuable research resource for scholars but also offers a platform for developers to learn and practice. It is expected that the widespread application of this dataset will have a profound impact on the field of natural language processing research.

【来源】https://mp.weixin.qq.com/s/oSQR3gCCDpJ3Wdu-9iTcbA