AI Systems Learn to Collaborate by Building Workflows
📄 Full Paper
💬 Ask
A new paper published in the preprint archive arXiv by researchers at the Shanghai AI Laboratory explores a new way to build sophisticated artificial intelligence (AI) systems. Instead of relying on single, monolithic AI models that are designed for specific tasks, they propose using a collaborative approach where different AI models and tools work together, communicating via automated workflows. The key to their approach is a novel framework called GenAgent, which automates the process of creating these complex workflows.
GenAgent uses a combination of AI agents, each with a specific role in the workflow generation process. These agents include:
- PlanAgent: This agent is responsible for the overall planning of the workflow. It receives the task description and generates a high-level plan, breaking it down into smaller, more manageable steps.
- CombineAgent: This agent takes the current workflow and combines it with relevant workflows from a knowledge base, enriching it with additional capabilities.
- AdaptAgent: This agent adjusts the parameters of the workflow to better achieve the desired outcome.
- RetrieveAgent: This agent searches a knowledge base for relevant workflows that can be used as examples or inspiration.
- RefineAgent: This agent debugs and refines the generated workflow, ensuring its correctness and stability.
The key innovation of GenAgent lies in its representation of workflows using code. This allows the AI agents to better understand and manipulate the workflows, leveraging the powerful capabilities of large language models (LLMs) in code generation. The authors implement GenAgent on the ComfyUI platform, a popular tool for image generation using stable diffusion models, and showcase its ability to automatically generate workflows for complex tasks like image editing and video generation.
For example, GenAgent successfully completes a task that involves the following steps:
- Image-to-video conversion: An image of Budapest is converted into a 2-second video with 8 frames per second.
- Frame interpolation: The frame rate is increased to 24 frames per second.
- Upscaling: The resolution of the video is increased to 1024x1024.
- Saving: The final output is saved as a high-quality GIF video.
The researchers compare GenAgent with several baseline approaches, including zero-shot, few-shot, Chain-of-Thought, and Retrieval-Augmented Generation. Their experiments demonstrate that GenAgent significantly outperforms these baselines in generating complex workflows and achieving high success rates.
This research is a significant step towards developing collaborative AI systems that can effectively solve complex and diverse tasks. GenAgent’s ability to automatically create workflows opens up new possibilities for AI development and could lead to the creation of more sophisticated and intelligent AI systems in the future.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.