复旦大学研发“眸思”大模型,助视障者“看见”世界
近日,复旦大学自然语言处理实验室(FudanNLP)宣布推出基于多模态大模型“复旦·眸思”(MouSi)打造的“听见世界”App,旨在帮助视障者“看见”世界。
“听见世界”App仅需一枚摄像头和一对耳机,即可将画面转化为语言。该系统采用复旦大学自主研发的“眸思”大模型,集成了图像识别、自然语言处理等多模态技术。
据悉,“听见世界”App具备描绘场景、提示风险等功能。视障者佩戴耳机后,摄像头会实时捕捉周围环境画面,并通过“眸思”大模型将其转化为语音描述。该系统可以识别物体、人物、场景,并提供详细的语音描述,帮助视障者了解周围环境。
例如,当视障者进入一个房间时,“听见世界”App会描述房间的布局、家具摆放等信息;当视障者在街上行走时,该系统会提示障碍物、红绿灯等风险信息。
复旦大学自然语言处理实验室主任邱锡鹏教授表示,“眸思”大模型是实验室多年研究成果的结晶。该模型在图像识别、自然语言处理方面拥有强大的能力,为视障者提供辅助信息提供了技术基础。
“听见世界”App的上线,标志着我国在视障辅助技术领域取得了重大突破。该系统将为视障者提供更加便利、高效的出行、生活方式,帮助他们更好地融入社会。
英语如下:
**Headline:** AI Empowers the Visually Impaired: ‘MouSi’ LargeModel Illuminates a Dark World
**Keywords:** Visual aid, large model, AI image description
**News Content:** Fudan University Develops ‘MouSi’ Large Model, Helping the Visually Impaired ‘See’ the World
Recently, the Natural Language Processing Laboratory of Fudan University (FudanNLP) announced the launch of the ‘Hear the World’ App, built on the multimodal large model ‘Fudan MouSi’ (MouSi), which aims to help thevisually impaired ‘see’ the world.
The ‘Hear the World’ App only requires a camera and a pair of headphones to convert images into speech. The system uses the ‘MouSi’ large model independently developed by Fudan University, integrating multimodal technologies such as image recognition and natural language processing.
It is reported that the ‘Hear the World’ App has functions such as scene description and risk提示. After the visually impaired person puts on the headphones, the camera will capture the surrounding environment in real time and convert it into a voice description through the ‘MouSi’ large model. The system can recognize objects, people, and scenes, and provide detailed voice descriptions to help the visually impaired understand the surrounding environment.
For example, when a visually impaired person enters a room, the ‘Hear the World’ App will describe the room’s layout, furniture placement, and other information; when a visually impaired person is walking on the street, the system will prompt for obstacles, traffic lights, and other risk information.
Professor Qiu Xipeng, director of the Natural Language Processing Laboratory of Fudan University, said that the ‘MouSi’ large model is the crystallization of years of research in the laboratory. The model has strong capabilities in image recognition and natural language processing, providing a technical foundation for providing auxiliary information to the visually impaired.
The launch of the ‘Hear the World’ App marks a major breakthrough in China’s visual aid technology field. The system will provide the visually impaired with a more convenient and efficient way of traveling and living, helping them better integrate into society.
【来源】https://www.ithome.com/0/753/295.htm
Views: 1