旷视推出多模态大模型Vary，一键将文档图片转为Markdown

作者智能小编

2 月 6, 2024 #多模态大模型, #文档级OCR, #旷视, #每日AI快讯

近日，旷视科技研究团队推出了一款支持文档级OCR的多模态大模型Vary，该模型能一键将文档图片直接转换为Markdown格式。以往，将文档图片转换为Markdown格式需要经过文本识别、布局检测和排序、公式表格处理、文本清洗等多个步骤，而现在只需输入一句话命令，即可实现端到端输出文档结果。

这款多模态大模型Vary支持中英文，大幅提高了文档处理的效率。对于科研人员、编辑、学生等有需要的人群来说，这款产品的推出无疑提供了一个极大的便利。值得一提的是，Vary模型还能应用于其他多种场景，如将纸质文档转换为电子文档、将图片中的文本提取并整理等。

英文翻译：

News title: MegVII Launches Vary, a Multimodal Large Model That Converts Document Images to Markdown with One Click
Keywords: MegVII, multimodal large model, document-level OCR, Markdown format

News content:

Recently, the MegVII research team has launched Vary, a multimodal large model that supports document-level OCR and can convert document images into Markdown format with just one click. In the past, converting document images into Markdown format required multiple steps such as text recognition, layout detection and sorting, formula and table processing, and text cleaning. Now, with a single command, Vary can achieve end-to-end output of document results.

This multimodal large model Vary supports both Chinese and English, significantly improving the efficiency of document processing. For researchers, editors, students, and other groups with relevant needs, the launch of this product has undoubtedly provided a great convenience. It’s worth mentioning that Vary model can also be applied to other scenarios, such as converting paper documents into electronic documents, extracting and organizing text from images, etc.

【来源】https://www.qbitai.com/2023/12/109275.html