我国知名人工智能公司旷视科技近日提出一款支持文档级OCR的多模态大模型Vary,该模型能一键将文档图片直接转换成Markdown格式,极大地简化了文档处理流程。以往,将文档图片转换成Markdown格式需要经过文本识别、布局检测和排序、公式表格处理、文本清洗等多个步骤,而现在,只需输入一句话命令,即可实现端到端输出文档结果。
Vary模型的推出,意味着人工智能技术在文档处理领域的应用又向前迈进了一步。这款模型不仅能识别中英文,还能处理多种文档格式,使得文档处理变得更加智能、高效。这对于科研人员、编辑、学生等有需要将文档图片转换为Markdown格式的人群来说,无疑是一个极大的福音。
英文翻译:
News title: MegVII Releases Vary, a Multimodal Large Model That Can Convert Documents to Markdown with One Click
Keywords: MegVII, multimodal large model, document-level OCR, Markdown format
News content:
China’s renowned artificial intelligence company MegVII has recently proposed a multimodal large model called Vary that can convert document images into Markdown format with just one click, greatly simplifying the document processing workflow. In the past, converting document images into Markdown format required multiple steps such as text recognition, layout detection and sorting, formula table processing, and text cleaning. Now, with a simple command, Vary can achieve end-to-end output of document results.
The launch of Vary means that the application of artificial intelligence technology in document processing has taken another step forward. This model can not only recognize both Chinese and English but also handle various document formats, making document processing more intelligent and efficient. For people who need to convert document images into Markdown format, such as researchers, editors, and students, this is undoubtedly a great blessing.
【来源】https://www.qbitai.com/2023/12/109275.html
Views: 1