旷视推出多模态大模型Vary，一键将文档图片转为Markdown

作者智能小编

2 月 2, 2024 #多模态大模型, #文档级OCR, #旷视, #每日AI快讯

新闻报道

近日，旷视科技研究团队推出了一款支持文档级OCR的多模态大模型Vary，该模型能够将文档图片直接转换为Markdown格式，极大地简化了这一复杂任务。以往，将文档图片转换为Markdown格式需要经过文本识别、布局检测和排序、公式表格处理、文本清洗等多个步骤，而现在，只需输入一句话命令，Vary便能端到端输出文档结果。

这款多模态大模型Vary支持中英文，旨在为用户提供更便捷、高效的文档处理方式。无论是在学术研究、企业办公还是个人创作场景中，Vary都能发挥巨大作用，帮助用户快速地将纸质文档或图片转换为Markdown格式，便于整理和编辑。

Vary的推出彰显了我国人工智能技术在自然语言处理领域的突破，也为用户带来了全新的文档处理体验。随着人工智能技术的不断进步，未来这类多模态大模型将在更多场景中发挥巨大潜力，助力各行各业实现高效、智能的数字化转型。

英文翻译：

News Title: MegVII Launches Vary, a Multimodal Large Model That Converts Documents to Markdown with One Command
Keywords: MegVII, multimodal large model, document-level OCR, Markdown format

News Content:

Recently, the research team of MegVII has launched Vary, a multimodal large model that supports document-level OCR and can convert document images into Markdown format, greatly simplifying this complex task. In the past, converting document images into Markdown format required multiple steps, such as text recognition, layout detection and sorting, formula table processing, and text cleaning. Now, with just a single command, Vary can output document results end-to-end.

Vary supports both Chinese and English, aiming to provide users with a more convenient and efficient way to handle documents. Whether in academic research, corporate office work, or personal creative scenarios, Vary can play a significant role in helping users quickly convert paper documents or images into Markdown format for easier organization and editing.

The launch of Vary highlights the breakthrough of artificial intelligence technology in the field of natural language processing and brings a new experience for users in document processing. As the continuous advancement of artificial intelligence technology, large multimodal models like Vary will have greater potential in more scenarios, helping various industries achieve efficient and intelligent digital transformation.

【来源】https://www.qbitai.com/2023/12/109275.html