Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

上海枫泾古镇一角_20240824上海枫泾古镇一角_20240824
0

Shenzhen, China – In a significant leap forward for artificial intelligence, a joint team from Huawei and the Harbin Institute of Technology (Shenzhen) has unveiled a groundbreaking framework for long-video understanding, dubbed AdaReTaKe (Adaptively Reducing Temporal and Knowledge redundancy). This innovation promises to revolutionize how AI models process and interpret extended video content, opening doors to advancements in fields like smart security, long-term memory for intelligent agents, and deeper multimodal reasoning.

The research, spearheaded by PhD student Wang Xiao from the Harbin Institute of Technology (Shenzhen) and Huawei researcher Si Qingyi, was conducted during Wang’s internship at Huawei. Wang’s expertise lies in multimodal video understanding and generation, while Si focuses on multimodal understanding, Large Language Model (LLM) post-training, and efficient inference.

The increasing prevalence and importance of video content have presented a critical challenge for multimodal large models: how to effectively process and understand long-duration videos. The ability to comprehend these extended narratives is crucial for various applications, demanding a solution that can efficiently handle the vast amount of information contained within.

AdaReTaKe addresses this challenge head-on by dynamically compressing redundant information during inference, enabling multimodal large models to process videos up to eight times longer (reaching an impressive 2048 frames) without requiring additional training. This adaptive redundancy reduction approach allows the models to focus on the most salient aspects of the video, significantly improving efficiency and performance.

The impact of AdaReTaKe is already being felt within the AI research community. The framework has achieved top rankings on several prominent long-video understanding benchmarks, including VideoMME, MLVU, LongVideoBench, and LVBench. It surpasses comparable open-source models by 3-5% on these benchmarks, establishing a new state-of-the-art for long-video understanding.

AdaReTaKe represents a significant step towards more intelligent and efficient video analysis, said [Insert Quote from Huawei/HIT Representative – This would add significant weight to the article]. By dynamically reducing redundancy, we’re enabling AI models to process and understand longer videos with greater accuracy and speed, paving the way for a wide range of new applications.

The success of AdaReTaKe highlights the power of collaboration between industry and academia. By combining Huawei’s expertise in AI and large-scale computing with the Harbin Institute of Technology (Shenzhen)’s cutting-edge research in multimodal understanding, the team has delivered a truly innovative solution to a critical challenge in the field.

The team’s paper, titled AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video, details the framework’s architecture and performance. The framework’s success suggests a promising future for AI-powered video understanding, with potential applications spanning security, robotics, and beyond. Further research will likely focus on optimizing the redundancy reduction process and exploring the application of AdaReTaKe to even longer and more complex video sequences.


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注