国内首创UI大模型亮相：设计师的得力助手

在IXDC2024国际体验设计大会上，AI时代设计工具Motiff妙多推出其自主研发的UI多模态大模型——Motiff妙多大模型，这是全球首个由UI设计工具企业研发的大模型。该模型具备出色的UI理解能力和执行开放式指令的能力，在五个行业公认的UI能力基准测试集中，各项指标均超过了GPT-4和苹果的Ferret UI，同时在Screen2Words（界面描述与推断）和Widget Captioning（部件描述）两大指标上也超越了谷歌的ScreenAI，其中Widget Captioning指标高达161.77，刷新SoTA。

Motiff妙多大模型在理解用户界面方面表现卓越，它不仅能识别界面中所有的图片、图标、文字和40多种细粒度UI组件，还能精确标注界面上不同元素的区域坐标。此外，它还能够回答与用户界面相关的各种问题，并根据界面信息进行功能推断、详细描述界面内容。相较于GPT-4、Ferret UI和ScreenAI等大模型，Motiff妙多大模型还在界面分析能力上具有显著优势。

Motiff妙多大模型在理解和表述能力上也最接近人类。此前的解决方案（如 Ferret UI 和 ScreenAI）难以根据上下文理解图标的含义，Motiff妙多大模型通过人工标注等方式收集了大量高质量的UI领域数据，能理解并指出同一图标在不同界面中的多种含义，显著提升了描述的准确度和情境相关性。

Motiff妙多大模型还具备交互导览能力，可以根据用户需求提示操作步骤，并在获得许可后替代用户完成相关操作。这为未来的界面交互革命奠定了基础。未来，用户无需手动点击屏幕，只需语音或图像输入即可操作设备，Siri等手机助手可能成为所有App的新入口，真正的智能手机和电脑将由此诞生，软件应用新范式和界面交互新时代也将随之开启。

业内观点认为，错误率的大幅下降标志着AI从辅助工具向独立完成工作的“技术奇点”迈进。目前，大模型面临的核心问题之一是较高的错误率，如GPT-4在多个指标上有30%至40%的错误率，在UI领域错误率更是高达50%至60%，而Motiff妙多大模型成功将错误率控制在个位数内，这是一个重大的技术突破。

英语如下：

News Title: “Domestic First-of-Its-Kind UI Large Model Unveiled: A Powerful Assistant for Designers”

Keywords: UI Large Model, Designer Assistant, AI Development

News Content: At the IXDC2024 International Experience Design Conference, Motiff, a cutting-edge AI design tool, unveiled its proprietary UI multimodal large model – the Motiff Grand Model. This marks the first large model developed by an enterprise specializing in UI design tools globally. The model excels in understanding UI and executing open-ended instructions, surpassing GPT-4 and Apple’s Ferret UI in five industry-recognized UI capability benchmark test sets. It also outperforms Google’s ScreenAI in two key metrics: Screen2Words (interface description and inference) and Widget Captioning (component description), with the latter reaching 161.77, breaking the State-of-the-Art (SoTA).

The Motiff Grand Model demonstrates exceptional performance in understanding user interfaces, capable of identifying all images, icons, text, and over 40 fine-grained UI components in the interface. It can also precisely annotate the region coordinates of different elements on the interface. Additionally, it can answer various questions related to the user interface and infer functionality based on interface information, providing detailed descriptions of the interface content. Compared to other large models such as GPT-4, Ferret UI, and ScreenAI, the Motiff Grand Model also has a significant advantage in interface analysis capabilities.

The Motiff Grand Model’s understanding and expression capabilities are closest to humans. Previous solutions, such as Ferret UI and ScreenAI, struggled to understand the meaning of icons based on context. The Motiff Grand Model collects a large volume of high-quality UI-related data through manual annotation and other methods, allowing it to understand and point out the multiple meanings of the same icon in different interfaces, significantly enhancing the accuracy and contextual relevance of the descriptions.

The Motiff Grand Model also possesses interactive navigation capabilities, guiding users through operations and, with permission, completing related actions on behalf of the user. This lays the groundwork for a future revolution in interface interaction. In the future, users may not need to manually click on the screen; instead, they can operate devices using voice or image input. Assistants like Siri on smartphones could become the new entry points for all apps, leading to the true birth of smartphones and computers, and opening a new era of software application paradigms and interface interactions.

Industry experts believe that a significant reduction in error rates marks a move towards AI becoming a “technological singularity” where it can independently complete work, moving beyond a mere auxiliary tool. One of the core issues currently facing large models is their high error rates, such as GPT-4, which has a 30% to 40% error rate in multiple metrics, with a particularly high 50% to 60% error rate in the UI domain. The Motiff Grand Model successfully controls its error rate to single digits, representing a significant technical breakthrough.

【来源】https://www.jiqizhixin.com/articles/2024-08-19-7