旷视多模态大模型Vary助力文档级OCR，中英文翻译仅需一句话

作者智能小编

1 月 2, 2024 #Vary, #多模态大模型, #旷视, #每日AI快讯

旷视，这家国内领先的人工智能技术公司，又推出了一款多模态大模型Vary，支持中英文文档级OCR。以往，将一份文档图片转换成Markdown格式需要进行文本识别、布局检测和排序、公式表格处理、文本清洗等多个步骤，十分繁琐。而如今，只需输入一句话命令，旷视的研究团队推出的多模态大模型Vary就能直接端到端输出文档结果，真正实现了“一键转换”。

据了解，旷视的多模态大模型Vary，采用先进的人工智能技术，能够实现对文档中的每一句话、每一个段落进行识别和理解，从而实现完美的翻译和输出。Vary支持中英文两种模式，用户只需根据需要选择相应的语言即可。

此次推出的多模态大模型Vary，不仅为用户提供了更加便捷、高效的文档级OCR服务，还进一步降低了人工智能技术门槛，为各行各业提供更加稳定、可靠的技术支持。

新闻翻译：

Chinese Title: 旷视多模态大模型Vary助力文档级OCR，中英文翻译仅需一句话

Keywords: MegVII, Multi-modal, Vary, Document-level OCR, Chinese-English translation

news content:

MegVII, one of the leading Chinese artificial intelligence technology companies, has just released a multi-modal large model Vary, which supports Chinese-English document-level OCR. In the past, converting a document image to Markdown format required several steps such as text recognition, layout detection and sorting, formula table processing, and text cleaning. However, now, with just a single sentence command, MegVII’s research team’s multi-modal large model Vary can directly output document results, truly achieving “one touch conversion.”

It is known that Vary, the multi-modal large model developed by MegVII, uses advanced artificial intelligence technology to recognize and understand every sentence and paragraph in the document, thus enabling perfect translation and output. Vary supports both Chinese and English modes, allowing users to choose the language they need.

The launch of the multi-modal large model Vary not only provides users with more convenient, efficient document-level OCR services, but also further reduces the technical threshold of artificial intelligence technology, providing more stable and reliable technology support for various industries.

【来源】https://www.qbitai.com/2023/12/109275.html