鹏城实验室开源百万规模具身智能数据集,助力具身智能发展

机器之心报道

具身智能作为近年来人工智能领域的新兴方向,正逐渐成为学术界和产业界的热门话题。然而,由于数据获取成本高昂,缺乏高质量、大规模的具身数据集一直是该领域发展的一大瓶颈。为了解决这一问题,鹏城实验室多智能体与具身智能研究所联合南方科技大学、中山大学,正式发布并开源了最新的具身智能领域学术成果——ARIO(All Robots In One)具身大规模数据集。

ARIO 数据集包含了 2D、3D、文本、触觉、声音5 种模态的感知数据,涵盖操作和导航两大类任务,既有仿真数据,也有真实场景数据,并且包含多种机器人硬件,数据丰富度极高。该数据集规模达到三百万,同时保证了数据的统一格式,是目前具身智能领域同时达到高质量、多样化和大规模的开源数据集。

“具身智能数据本身比单纯的图像和文本数据要复杂很多,需要记录很多控制参数,” 鹏城实验室林倞教授团队表示,“如果没有一个统一的格式,当多种类型的机器人数据聚合到一起,需要花费大量的精力去做额外的预处理。” 为了解决这一问题,团队首先设计了一套针对具身大数据的格式标准,该标准能记录多种形态的机器人控制参数,并且有结构清晰的数据组织形式,还能兼容不同帧率的传感器并记录对应的时间戳,以满足具身智能大模型对感知和控制时序的精确要求。

ARIO 数据集的发布,将为具身智能研究提供宝贵的资源,推动该领域的研究和应用发展。该数据集的开源,也体现了鹏城实验室在具身智能领域的技术领先地位和开放共享的理念。

论文链接:

http://arxiv.org/abs/2408.10899

项目主页:

https://imaei.github.io/project_pages/ario/

鹏城实验室具身所网站链接:

https://imaei.github.io/

关于鹏城实验室:

鹏城实验室是深圳市建设中国特色社会主义先行示范区的重要科技创新载体,致力于打造国家级基础研究平台和战略科技力量。实验室聚焦人工智能、网络通信、新型材料等领域,开展前沿基础研究和关键技术攻关。

英语如下:

Pengcheng Lab Releases Million-Scale Embodied Intelligence Dataset, Breaking Data Barriers

Keywords: Embodied Intelligence, Open-source Data, Pengcheng Lab

Machine Intelligence Report

Embodied intelligence, an emerging field in artificial intelligence, is rapidly gaining traction in academia and industry. However, the high costof data acquisition and lack of high-quality, large-scale embodied datasets have been major bottlenecks for its development. To address this challenge, the Multi-Agentand Embodied Intelligence Institute of Pengcheng Lab, in collaboration with Southern University of Science and Technology and Sun Yat-sen University, has officially released and open-sourced its latest academic achievement in embodied intelligence – the ARIO (All Robots InOne) large-scale dataset.

The ARIO dataset comprises perception data from five modalities: 2D, 3D, text, touch, and sound. It covers two main tasks: manipulation and navigation. It includes both simulatedand real-world data, encompassing various robotic hardware, resulting in exceptional data richness. The dataset boasts a scale of three million, while ensuring data uniformity. It is currently the only open-source embodied intelligence dataset that simultaneously achieves high quality, diversity, and scale.

“Embodied intelligence data is inherently more complex thanpure image and text data, requiring the recording of numerous control parameters,” stated the team led by Professor Lin Liang from Pengcheng Lab. “Without a unified format, aggregating data from various types of robots would require significant effort for additional preprocessing.” To address this, the team first designed a format standard for embodied big data.This standard can record various forms of robot control parameters, features a clear data organization structure, and is compatible with different sensor frame rates while recording corresponding timestamps, meeting the precise requirements of embodied intelligence large models for perception and control sequencing.

The release of the ARIO dataset will provide valuable resources for embodied intelligence research, drivingthe development of research and applications in this field. The open-sourcing of this dataset also reflects Pengcheng Lab’s technological leadership and open-sharing philosophy in embodied intelligence.

Paper Link:

http://arxiv.org/abs/2408.10899

Project Homepage:

https://imaei.github.io/project_pages/ario/

Pengcheng Lab Embodied Intelligence Institute Website:

https://imaei.github.io/

About Pengcheng Lab:

Pengcheng Lab is a crucial scientific and technological innovation platform for Shenzhen’s construction of apilot demonstration zone for socialism with Chinese characteristics. It aims to build a national-level basic research platform and strategic scientific and technological force. The lab focuses on areas such as artificial intelligence, network communication, and new materials, conducting cutting-edge basic research and tackling key technological challenges.

【来源】https://www.jiqizhixin.com/articles/2024-08-23-2

Views: 4

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注