AI Papers Reader

Personalized digests of latest AI research

View on GitHub

SketchAgent: A Language-Driven AI That Sketches Like a Human

A new artificial intelligence (AI) system, called SketchAgent, can generate remarkably human-like sketches from simple text descriptions. Developed by researchers at MIT and Stanford University, the system avoids the need for extensive training datasets by cleverly leveraging the capabilities of large language models (LLMs). Instead of training on a massive collection of hand-drawn sketches, SketchAgent uses an off-the-shelf multimodal LLM and an intuitive sketching language to create its drawings, stroke by stroke, in a manner that mirrors the creative process of human sketching.

The core innovation lies in how SketchAgent translates natural language instructions into visual output. Unlike previous AI sketch generators that optimize all strokes simultaneously, SketchAgent generates its drawings sequentially. This approach is inspired by the iterative, dynamic nature of human drawing, where a person builds up a picture step by step, refining the work as they go along.

The researchers developed a custom “sketching language” for the LLM. This language describes strokes in terms of start and end points in a coordinate system overlayed on a gridded canvas. The LLM then translates this coordinate-based description into Bézier curves, a mathematical representation of smooth curves often used in computer graphics, which are then rendered to create the sketch. For example, to draw a simple line, the user might write something like: “Draw a line from coordinate (5, 10) to coordinate (15,20).” SketchAgent translates this concise instruction into a visually appealing, curved line. For more complex shapes, the user can give it a detailed sequence of instructions, producing more elaborate results.

To make the AI’s spatial reasoning more effective, the researchers utilize a numbered grid canvas. This helps the LLM handle spatial information more precisely and naturally. For collaborative sketching, the canvas remains accessible to both the human user and the agent, with the agent pausing to let the user add their strokes before it continues.

SketchAgent’s capabilities were tested in several scenarios. It was shown to generate sketches from diverse prompts, from simple objects like “a cat” to complex concepts like “a neural network”. The AI also successfully engaged in dialogue-driven drawing and collaborative sketching with human users. The collaborative sketches produced often blended human creativity and machine precision, creating surprisingly sophisticated and original images.

A key finding from user studies was that people found SketchAgent’s sketches more similar to human sketches than other, more sophisticated, AI-generated drawings that only optimized for final output. This emphasizes the significance of the sequential sketching process in creating drawings that resonate with human aesthetic sensibilities.

While the research demonstrates the significant potential of this approach, some limitations remain. Currently, SketchAgent’s performance depends on the capabilities of the underlying LLM, and it sometimes struggles with highly detailed or complex drawings. The research team notes the ongoing advances in LLM technology and how these advancements will likely further improve SketchAgent’s output in the future. Despite these limitations, SketchAgent showcases a promising new path towards more human-centric, creative AI systems capable of seamlessly integrating with human imagination and problem-solving.