Cradle: A New Era of AI Agents for General Computer Control
Beijing,China – A groundbreaking new AI framework, dubbed Cradle, has emerged, promising to revolutionize how AI interacts with computers. Developed by Kunlun Wanwei in collaboration with leading institutions like Beijing Academy of Artificial Intelligence (BAAI),Nanyang Technological University, and Peking University, Cradle is a multi-modal AI agent framework designed for General Computer Control (GCC). Unlike previous AI agents, Cradledoes not require extensive training to control computers like humans do, using keyboard and mouse interactions without relying on internal APIs.
Cradle’s ability to interact with any software, whether open-source or proprietary, sets it apart. Itis the first AI framework capable of playing various commercial games and operating a wide range of software applications. The project, including its research paper, code, and documentation, has been made publicly available.
Cradle’s Key Features:
- Multi-modal Information Gathering: Cradle gathers information from both visual (screen images) and auditory (sound) inputs, mimicking human perception to understand computer interfaces and environments.
- Self-Reflection: Cradle continuously evaluates the success of its actions, analyzing failures to guide future decisions.
- TaskInference: Based on current environment and historical data, Cradle infers and selects the optimal next task.
- Skill Planning: Cradle generates and updates skills relevant to the given task, adapting to different computer operation needs.
- Action Planning: Cradle translates strategies into executable commands by generating specific keyboard and mouse control actions.
Technical Principles Behind Cradle:
- Multi-modal Input Processing: Cradle receives and processes multi-modal inputs, including screen images and audio, simulating human perception to understand computer interfaces and environments.
- Information Extraction and Understanding: Large multi-modal models like GPT-4V are used to identifyvisual elements, text information within images, and instructions or feedback from audio.
- Self-Reflection Mechanism: A reflection module allows Cradle to evaluate the success of previously executed actions and analyze the reasons for failures, providing insights for strategy adjustments.
- Task Inference and Planning: Cradle uses a task inference module todetermine the current priority task and develops new actions to accomplish it within the action planning module.
- Skill Generation and Updating: The skill planning module is responsible for generating new skills or updating existing ones based on the current task. Skills are represented as code functions that can be instantiated and executed.
- Memory andKnowledge Management: Cradle possesses both long-term and short-term memory systems, storing past experiences and skills for retrieval and application when needed.
Applications of Cradle:
- Desktop Software Automation: Automating repetitive tasks in desktop software like document editing, spreadsheet processing, and image editing.
- Web ContentInteraction: Simulating user interactions with web pages, including form filling, button clicking, and navigating links.
- Game Environments: Controlling game characters in environments like Red Dead Redemption II to perform tasks, explore environments, and engage in combat.
- Professional Software Operation: Learning and executing specific creative tasks in softwarerequiring specialized skills, such as graphic design or video editing software.
- Everyday Computer Tasks: Executing routine computer tasks like file management, email processing, and scheduling appointments.
Cradle’s Significance:
The release of Cradle marks a significant step towards a future where AI agents can seamlessly interact with computers, automating tasks and enhancing productivity. Its ability to control computers without requiring extensive training opens up new possibilities for AI applications across various domains, from gaming and software development to everyday computer use.
Future Prospects:
As research and development continue, Cradle is expected to become even more sophisticated, capable of handling increasingly complextasks and adapting to diverse environments. Its potential to revolutionize how we interact with computers is vast, promising a future where AI agents become indispensable tools for individuals and businesses alike.
Project Links:
- GitHub Repository: https://github.com/BAAI-Agents/Cradle
- arXiv TechnicalPaper: https://arxiv.org/pdf/2403.03186
Cradle’s open-source nature encourages collaboration and innovation, paving the way for a new era of AI agents that can truly understand and interact with the digital world. The future of AI-powered computer control ishere, and it’s just beginning.
【source】https://ai-bot.cn/cradle/
Views: 1