Meta开源新版分割模型：视频也能轻松分割

Meta公司近日宣布，其开发的「分割一切」2.0模型（Segment Anything Model 2，简称SAM 2）正式开源，该模型不仅能够处理静态图像，还能对视频内容进行实时、可提示的对象分割，标志着计算机视觉领域的一次重大进步。

SAM 2模型是在去年4月发布的SAM模型的基础上进一步发展而来，它能够处理任何对象，甚至在模型未曾见过的对象和视觉域中也能实现分割。SAM 2的推出，使得视频分割体验发生了重大变化，并且在图像和视频应用程序中实现了无缝使用。

SAM 2在图像分割准确率方面超越了之前的功能，并且实现了比现有工作更好的视频分割性能。此外，它所需的交互时间仅为原来的1/3，这极大地提升了用户体验。该模型的架构采用了创新的流式内存（streaming memory）设计，使其能够按顺序处理视频帧，特别适合实时应用。

Meta公司还发布了一个大型带注释数据库，包括大约51,000个真实世界视频和超过600万个masklets，这一数据集在视频数量上比现有最大的数据集多4.5倍，注释多53倍。SAM 2的论文中还提到了另一个包含超过100,000个视频的数据库，尽管这个数据库没有公开。

SAM 2模型的开源意味着它可以免费使用，并已经在Amazon SageMaker等平台上托管。用户可以通过Web演示体验地址https://sam2.metademolab.com/demo来尝试分割和跟踪视频中的对象。

SAM 2的推出引起了广泛关注，有人已经在未提供的测试视频上试用SAM 2，并对其效果给予了高度评价。甚至有观点认为，SAM 2的出现可能会使其他相关技术黯然失色。

SAM 2的发布展示了人工智能技术在处理视频内容方面的巨大潜力，它不仅能够应用于各种实际用例，还可能成为更大型AI系统的一部分，为未来的技术发展开辟新的道路。

英语如下：

News Title: “Meta Releases New Open-Source Segmentation Model: Easily Segment Videos”

Keywords: Open Source, Video Segmentation, Significant Progress

News Content:
Meta announced recently that its developed “Segment Anything Model 2” (SAM 2), a model that can handle both static images and real-time, prompt-driven object segmentation in video content, has been officially released as open-source. This marks a significant advancement in the field of computer vision.

SAM 2 is an evolution of the SAM model released in April last year. It can handle any object, even in unseen objects and visual domains, making it a powerful tool for segmentation. The release of SAM 2 has brought about a significant change in the experience of video segmentation, and it enables seamless use in image and video applications.

SAM 2 surpasses previous functionality in image segmentation accuracy and achieves better video segmentation performance compared to existing work. Additionally, it requires only one-third the interaction time, significantly enhancing the user experience. The model’s architecture features an innovative streaming memory design, allowing it to process video frames sequentially, making it particularly suitable for real-time applications.

Meta also released a large annotated database, including approximately 51,000 real-world videos and over 6 million masklets, which is 4.5 times more in volume and 53 times more in annotation compared to the largest existing dataset. The SAM 2 paper also mentions another database with over 100,000 videos, although this dataset is not publicly available.

The open-source release of SAM 2 means it can be used freely and is already hosted on platforms like Amazon SageMaker. Users can try to segment and track objects in videos through a web demonstration at https://sam2.metademolab.com/demo.

The release of SAM 2 has garnered widespread attention. People have already tried SAM 2 on videos not provided and have given it high praise for its performance. Some even believe that SAM 2’s emergence may make other related technologies pale in comparison.

The release of SAM 2 demonstrates the vast potential of artificial intelligence technology in handling video content, which can be applied to various practical use cases and may become a part of larger AI systems, opening new paths for future technological development.

【来源】https://www.jiqizhixin.com/articles/2024-07-30-5