出门问问宣布开放“序列猴子”首个开源数据集
【北京】知名人工智能公司出门问问近日宣布,将开放其自主研发的超大规模语言模型“序列猴子”的部分训练数据集,命名为“序列猴子开源数据集1.0”。此次开源的数据集包含丰富的中文通用文本语料、古诗今译语料以及文本生成语料,旨在促进人工智能领域的开放研究和合作。
“序列猴子”数据集的开放,将为研究者和开发者提供一个高质量的数据资源,有助于他们进行模型训练、算法优化和应用创新。出门问问表示,通过共享数据,公司希望能够推动人工智能技术的进步,并为全球学术界和产业界提供一个共同进步的平台。
据悉,“序列猴子开源数据集1.0”涵盖了广泛的中文语料,包括日常对话、新闻报道、网络评论等多种类型的文本,以及多篇古诗今译的样本,这些数据有助于训练模型更好地理解和生成中文内容。此外,文本生成语料则能够帮助模型学习如何生成连贯和有意义的文本。
出门问问的这一举措受到了业内人士的广泛关注,被认为是人工智能领域开放合作的重要一步。随着技术的不断发展,数据共享和开放合作将成为推动人工智能进步的重要途径。
英语如下:
News Title: “Out of the Question Opens the First Open Source Dataset of ‘Serial Monkeys'”
Keywords: Open Source Data, Language Model, Text Corpus
News Content: Out of the Question Announces Open of the First Open Source Dataset of “Serial Monkeys”
[Beijing] Prominent artificial intelligence company Out of the Question recently announced that it will open its proprietary large-scale language model “Serial Monkeys” by releasing part of its training data, named “Serial Monkeys Open Source Dataset 1.0.” This open-source dataset includes abundant Chinese general text corpus, ancient poetry modern translation corpus, and text generation corpus, aiming to promote open research and collaboration in the field of artificial intelligence.
The open sourcing of the “Serial Monkeys” dataset will provide researchers and developers with high-quality data resources, which are conducive to model training, algorithm optimization, and application innovation. Out of the Question stated that by sharing the data, the company aims to drive the advancement of artificial intelligence technology and provide a common platform for progress to the global academic and industrial communities.
It is reported that the “Serial Monkeys Open Source Dataset 1.0” covers a wide range of Chinese data, including various types of text such as daily conversations, news reports, and online comments, as well as samples of ancient poetry modern translations, which are beneficial for training models to better understand and generate Chinese content. Additionally, the text generation corpus will help models learn how to generate coherent and meaningful text.
Out of the Question’s initiative has attracted widespread attention from industry insiders and is considered an important step forward in the open collaboration of the artificial intelligence sector. As technology continues to evolve, data sharing and open collaboration will become important pathways for advancing artificial intelligence.
【来源】https://mp.weixin.qq.com/s/oSQR3gCCDpJ3Wdu-9iTcbA
Views: 0