中国电信于1月10日宣布开源其星辰语义大模型TeleChat-7B版本,同时开放了1TB的清洗数据集供研究者和开发者使用。这一举措是中国电信在推动人工智能领域开放合作、共同发展的重要一步。据悉,星辰语义大模型是由中电信人工智能科技有限公司研发和训练的大语言模型,其训练语料采用了1.5万亿Tokens的中英文语料,代表了目前中文自然语言处理领域的顶尖技术水平。
中国电信表示,TeleChat-7B模型的开源将促进学术界和产业界的交流与合作,推动中文自然语言处理技术的发展。此外,中国电信还计划在1月20日开源12B版本的模型,进一步扩大开源大模型生态的共建。
英文标题:China Telecom Open Sources StarChat Semantic Large Model TeleChat-7B
关键词:China Telecom, Open Source, StarChat Semantic Large Model
News content:
China Telecom announced on January 10th the open sourcing of the StarChat Semantic Large Model TeleChat-7B version, while also opening a 1TB cleaned dataset for researchers and developers. This move is an important step by China Telecom in promoting open collaboration and shared development in the field of artificial intelligence. It is understood that the StarChat Semantic Large Model was developed and trained by China Telecom Artificial Intelligence Technology Co., Ltd., using 1.5 trillion Tokens of Chinese-English bilingual corpus, representing the top technical level in the field of Chinese natural language processing.
China Telecom said that the open sourcing of the TeleChat-7B model will promote academic and industrial exchanges and cooperation, and drive the development of Chinese natural language processing technology. In addition, China Telecom also plans to open source the 12B version model on January 20th, further expanding the construction of the open-source large model ecosystem.
【来源】https://www.ithome.com/0/744/969.htm
Views: 1