Hugging Face开源全球最大AI合成数据集 Cosmopedia

作者智能小编

5 月 22, 2024 #人工智能, #开源数据集, #文本生成, #每日AI快讯

Hugging Face近日宣布开源了一款名为“Cosmopedia”的AI训练数据集，这是迄今为止世界上最大的合成数据集。该数据集由Mixtral 7b模型汇总生成，包含了3000万以上的文本文件，共计250亿个Token，涵盖了教科书、博客文章、故事小说、WikiHow教程等多种内容类型。这一数据集的开放，无疑将为人工智能领域的研究提供了宝贵的数据资源，有助于推动AI模型的训练和进步。

英语如下：

News Title: “Hugging Face Opens Source the Largest AI Synthetic Dataset ‘Cosmopedia’ Globally”

Keywords: Open-source Dataset, Artificial Intelligence, Text Generation

News Content: Hugging Face recently announced the open-source release of an AI training dataset named “Cosmopedia.” This is the largest synthetic dataset to date, created by the Mixtral 7b model, which compiles over 30 million text files, totaling 25 billion Tokens. The dataset encompasses a variety of content types, including textbooks, blog posts, short stories, and WikiHow tutorials. The availability of this dataset will undoubtedly provide valuable data resources for research in the artificial intelligence field, aiding in the training and advancement of AI models.

【来源】https://www.ithome.com/0/751/688.htm

智能新闻

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Hugging Face开源全球最大AI合成数据集 Cosmopedia

作者智能小编

相关文章

Sports Brands Go Big Outsizing Luxury with Mega-Stores

TikTok劲敌？两天MVP估值5亿，资本狂涌！

运动品牌“巨无霸”店来袭，奢侈品都得让路？

发表回复取消回复

为您推荐

Sports Brands Go Big Outsizing Luxury with Mega-Stores

TikTok劲敌？两天MVP估值5亿，资本狂涌！

运动品牌“巨无霸”店来袭，奢侈品都得让路？

Cloudflare Workers & Hyperdrive Supercharge Global MySQL App Performance

作者智能小编

相关文章

发表回复 取消回复

为您推荐

发表回复取消回复