上海交通大学与上海人工智能实验室联合发布全球首个基于文本提示的3D医疗图像通用分割大模型——SAT(Segment Anything in radiology scans, driven by Text prompts)。这一创新性成果在医学图像处理领域实现了重大突破,为医疗诊断、手术规划和疾病监测等临床任务提供了更高效、便捷的解决方案。
传统医疗图像分割模型往往针对特定任务进行定制,应用范围有限。而大语言模型虽在医疗领域取得显著进展,但构建一个能同时具备语言理解与定位能力的通用分割工具,对推动医疗人工智能的通用性至关重要。SAT模型的发布,正是为解决这一难题而生。
SAT模型的核心创新在于将人体解剖学知识融入文本编码器,通过精准编码解剖学术语,设计出一种基于文本提示的通用分割模型。这一模型不仅能够自动化处理广泛的分割需求,还能够作为大语言模型的代理工具,直接赋予后者定位和分割的能力。
为了实现这一目标,研究团队构建了一个包含6千多个人体解剖学概念的多模态知识图谱,涵盖了来自UMLS、网络上的权威解剖学知识以及公开分割数据集的信息。这一知识图谱不仅丰富了模型的理解能力,还为模型与临床知识之间的连接提供了坚实基础。
此外,团队还创建了最大规模的3D医学图像分割数据集——SAT-DS,汇集了来自CT、MR和PET三种模态的2万2千多张图像及其分割标注,涵盖了人体8个主要部位中的497个分割目标。基于此数据集,研究团队训练了两款不同大小的模型:SAT-Pro(447M参数)和SAT-Nano(110M参数),并从多个角度验证了SAT模型的价值。实验结果显示,SAT模型在性能上与72个nnU-Nets专家模型相当,甚至在域外数据上表现出更强的泛化能力。同时,与基于box提示的MedSAM模型相比,SAT基于文本提示的分割更为精准、高效。
这一成果不仅标志着医疗图像处理领域的一次飞跃,也为医疗人工智能的未来开辟了新的可能性。通过SAT模型,医疗专业人员可以更快速、准确地获取关键信息,为患者提供更高质量的医疗服务。同时,这一模型的开放性设计,为医疗领域的研究者和实践者提供了宝贵的资源和工具,加速了医疗人工智能的创新和发展。
总之,基于文本提示的3D医疗图像通用分割大模型SAT的发布,是医疗图像处理领域的一项重大突破,它不仅提升了医疗诊断和治疗的效率,也为医疗人工智能的通用性和实用性提供了新的范式。
英语如下:
News Title: “Global First Multimodal Text Prompt 3D Medical Image General Segmentation Model SAT Unveiled”
Keywords: General Segmentation Model, Text Prompt, Medical Image
News Content: Shanghai Jiao Tong University and the Shanghai AI Laboratory jointly announced the release of the world’s first large-scale general segmentation model for 3D medical images, driven by text prompts – SAT (Segment Anything in Radiology Scans, powered by Textual Instructions). This groundbreaking achievement in the field of medical image processing marks a significant leap, offering more efficient and convenient solutions for clinical tasks such as medical diagnosis, surgical planning, and disease monitoring.
Traditional medical image segmentation models are often tailored for specific tasks, limiting their application scope. While large language models have made substantial progress in the medical field, constructing a tool that possesses both language understanding and spatial localization capabilities is crucial for advancing the universality of medical AI. The SAT model’s release addresses this challenge.
The core innovation of the SAT model lies in integrating anatomical knowledge into a text encoder, enabling precise encoding of anatomical terms. This design leads to a general segmentation model driven by text prompts, capable of automating the processing of diverse segmentation needs. Moreover, it acts as a proxy for large language models, directly bestowing them with spatial localization and segmentation abilities.
To achieve this, the research team constructed a multimodal knowledge graph containing over 6,000 anatomical concepts, incorporating information from UMLS, authoritative anatomy knowledge from the web, and data from public segmentation datasets. This knowledge graph enriches the model’s understanding capabilities and lays a solid foundation for the connection between the model and clinical knowledge.
Additionally, the team created the largest 3D medical image segmentation dataset, SAT-DS, comprising over 22,000 images from CT, MR, and PET modalities, each annotated with their respective segmentations, covering 497 segmentation targets across 8 major body parts. Based on this dataset, the team trained two models of different sizes: SAT-Pro (with 447M parameters) and SAT-Nano (with 110M parameters), validating the SAT model’s value from multiple perspectives. The experimental results showed that the SAT model performs comparably to 72 expert nnU-Nets models, with even stronger generalization capabilities on out-of-domain data. Compared to the MedSAM model using box prompts, the SAT model’s segmentation based on text prompts is more precise and efficient.
This achievement not only signifies a leap in the field of medical image processing but also opens up new possibilities for the future of medical AI. Through the SAT model, medical professionals can more quickly and accurately access critical information, providing patients with higher-quality medical services. Moreover, the open design of this model provides invaluable resources and tools for researchers and practitioners in the medical field, accelerating the innovation and development of medical AI.
In summary, the unveiling of the text-driven 3D medical image general segmentation large model SAT is a major breakthrough in the field of medical image processing. It not only enhances the efficiency of medical diagnosis and treatment but also offers a new paradigm for the universality and practicality of medical AI.
【来源】https://www.jiqizhixin.com/articles/2024-06-27-5
Views: 2