Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

0

Mora: A Multi-Agent Framework for 12-Second Video Generation

Researchers from Microsoft and Lehigh University have unveiled Mora, a multi-agentframework designed for general video generation tasks. This innovative framework aims to mimic and expand upon OpenAI’s groundbreaking Sora video generation model. Mora’s core principlelies in the collaborative efforts of multiple visual agents to produce high-quality video content. By breaking down the video generation process into sub-tasks and assigning a dedicatedagent to each, Mora achieves a range of video generation capabilities.

Mora’s Key Features:

  • Text-to-Video Generation: Mora can automatically generate video content based on user-provided text descriptions, encompassing simple scenedescriptions to complex storylines.
  • Image-to-Video Generation: Beyond direct text-based generation, Mora can leverage user-supplied initial images and text prompts to create matching video sequences, enhancing content richness and detail.
  • Extended Video Generation: Mora goes beyond generating videos from scratch, offering the ability to extend and edit existing video content by adding new elements or increasing the duration.
  • Video-to-Video Editing: Mora boasts advanced editing capabilities, enabling users to modify videos based on text instructions. This includes altering scenes, adjusting objectproperties, or adding new elements.
  • Video Concatenation: Mora seamlessly connects two or more video clips, creating smooth transitions. This feature is ideal for producing video compilations or edits.
  • Simulating Digital Worlds: Mora can create and simulate digital worlds, generating video sequences with a digital world aesthetic basedon text descriptions. Examples include game scenes or virtual environments.

How Mora Works:

Mora operates on a multi-agent framework, employing multiple specialized AI agents to accomplish video generation tasks. Each agent handles a specific sub-task, collectively forming the complete video generation process.

Mora’s workflow involves thefollowing steps:

  1. Task Decomposition: Mora breaks down complex video generation tasks into multiple sub-tasks, each handled by a dedicated agent.
  2. Agent Role Definition: Mora defines five fundamental agent roles:
    • Prompt Selection and Generation Agent: Utilizes large language models (LLMs)like GPT-4 or Llama to optimize and select text prompts, enhancing the relevance and quality of generated images.
    • Text-to-Image Generation Agent: Converts text prompts into high-quality initial images.
    • Image-to-Image Generation Agent: Modifies given source images based on textinstructions.
    • Image-to-Video Generation Agent: Transforms static images into dynamic video sequences.
    • Video Concatenation Agent: Creates smooth transitions between two input videos.
  3. Workflow: Based on task requirements, Mora automatically organizes agents to execute sub-tasks in a specificorder. For instance, text-to-video generation might involve:
    • The Prompt Selection and Generation Agent processing the text prompt.
    • The Text-to-Image Generation Agent generating an initial image based on the optimized text prompt.
    • The Image-to-Video Generation Agent converting the initialimage into a video sequence.
    • The Video Concatenation Agent (if needed) connecting multiple video clips into a cohesive video.
  4. Multi-Agent Collaboration: Agents communicate and collaborate through predefined interfaces and protocols, ensuring the coherence and consistency of the entire video generation process.
  5. Generationand Evaluation: Upon completing their sub-tasks, agents pass results to the next agent until the entire video generation process is complete. The generated video is then evaluated against predefined quality standards.
  6. Iteration and Optimization: Mora’s framework allows for iterative improvements in video generation quality. Agents can adjust their parameters basedon feedback to enhance performance.

Current Capabilities and Limitations:

While Mora demonstrates impressive capabilities in generating high-resolution (1024×576) videos lasting 12 seconds, containing 75 frames, it exhibits a noticeable performance gap compared to Sora when handling scenes with extensive object movement.Additionally, attempts to generate videos exceeding 12 seconds result in a significant decline in video quality.

Future Implications:

Mora’s multi-agent approach represents a significant advancement in video generation technology. Its ability to handle complex tasks and generate diverse video content holds immense potential for various applications, including:

*Content Creation: Simplifying video creation for individuals and businesses.
* Education and Training: Developing interactive and engaging educational materials.
* Entertainment: Producing high-quality animated content and visual effects.
* Research and Development: Facilitating research in areas like computer vision and artificial intelligence.

Availability:

The source code and models for Mora are expected to be open-sourced on GitHub: https://github.com/lichao-sun/Mora. The research paper detailing Mora’s architecture and performance is available on arXiv: http://arxiv.org/abs/2403.13248.

Mora’s emergence marks a significant step towards more accessible and versatile video generation tools. As research continues and the framework evolves, we cananticipate even more sophisticated and creative applications of this innovative technology.

【source】https://ai-bot.cn/mora-video-generation-framework/

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注