AI视界大突破：V-JEPA解锁物理世界理解

Meta推出V-JEPA视觉模型，让AI通过视频理解物理世界

科技巨头Meta近日推出了一款名为V-JEPA（视频联合嵌入预测架构）的视觉模型，该模型可以通过观看视频来学习理解物理世界。

V-JEPA旨在通过形成周围环境的内部模型，使人工智能具备计划、推理和执行复杂任务的能力。该模型通过观察视频中的物体和事件，学习物体之间的关系、物体的运动规律以及物理世界的因果关系。

Meta的研究人员表示，V-JEPA在各种任务上都表现出了出色的性能，包括物体识别、场景理解、动作预测和导航。例如，在物体识别任务中，V-JEPA能够准确识别视频中出现的所有物体，即使这些物体被遮挡或部分隐藏。在场景理解任务中，V-JEPA能够理解视频中发生的事件，例如人物在做什么、物体如何移动以及场景中的空间关系。

V-JEPA的开发是人工智能领域的一项重大突破。它使人工智能能够通过观看视频来学习理解物理世界，而无需通过大量的人工标注数据进行训练。这将极大地提高人工智能在各种应用中的实用性，例如机器人、自动驾驶汽车和虚拟现实。

Meta表示，V-JEPA目前仍在开发阶段，但计划在未来将其整合到公司的各种产品和服务中。例如，V-JEPA可以用于改善Meta的社交媒体平台上的视频推荐，或用于开发更智能的虚拟现实体验。

英语如下：

**Headline:** AI Vision Breakthrough: V-JEPA Unlocks Understanding of thePhysical World

**Keywords:** Vision model, physical world, artificial intelligence

**Body:**

Meta has unveiled V-JEPA, a vision model that enablesAI to learn to understand the physical world by watching videos.

The V-JEPA (Video Joint Embedding and Prediction Architecture) model is designed to give AI the ability to plan, reason, and perform complex tasks by forming an internal model of its surroundings. The model learns about the relationships between objects, how objects move, and the cause-and-effect relationships in the physical world by observing objects and events in videos.

Meta researchers say that V-JEPA has demonstrated strong performance on a range of tasks, including object recognition, scene understanding, action anticipation, and navigation. For example, on an object recognition task, V-JEPA was able to accurately identify all of the objects that appeared in a video, even if the objects were occluded or partially hidden. On a scene understanding task, V-JEPA was able to understand the events that were happening in a video, such as what people were doing, how objects were moving, andthe spatial relationships in the scene.

The development of V-JEPA is a significant breakthrough in the field of artificial intelligence. It enables AI to learn to understand the physical world by watching videos, rather than having to be trained on large amounts of manually labeled data. This could greatly increase the practicality of AI in a range of applications, such as robotics, self-driving cars, and virtual reality.

Meta says that V-JEPA is still under development, but the company plans to integrate it into a range of its products and services in the future. For example, V-JEPA could be used to improve video recommendations on Meta’s social media platforms or to develop more intelligent virtual reality experiences.

【来源】https://www.maginative.com/article/meta-is-teaching-ai-to-understand-and-model-the-real-world-by-watching-videos/