AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Large Language Models Can Now "Understand" Symbolic Graphics Programs

Large language models (LLMs) are increasingly used in many applications, including code generation. However, assessing their true understanding of code, especially for complex domains like graphics, remains challenging. To address this, researchers have developed a new benchmark called SGP-Bench that evaluates LLMs’ ability to “understand” symbolic graphics programs, which are a common way to represent visual content.

Symbolic graphics programs use a set of instructions to procedurally generate images or 3D objects. For example, a simple program might describe a square with a circle inside, which could be rendered into an image. The SGP-Bench test assesses how well LLMs can answer questions about the visual content of such programs, even without seeing the rendered image. This is like asking an LLM to “imagine” the image based on the program description alone.

The researchers found that even the most powerful LLMs currently available struggle with this task, demonstrating that there’s still a significant gap between human and machine understanding of symbolic graphics programs. They also proposed a new technique called Symbolic Instruction Tuning (SIT) to improve LLM performance. SIT involves finetuning LLMs on a dataset of symbolic programs and corresponding question-answer pairs, effectively teaching them to associate program code with visual descriptions.

The researchers tested a variety of LLMs on the SGP-Bench, finding that models finetuned with SIT consistently achieved better results, especially on challenging semantic questions. They also conducted a human study to validate the benchmark, finding high agreement between human and GPT-40 responses to the same questions.

This study highlights the importance of developing benchmarks specifically designed to evaluate the understanding of complex domains for LLMs. The SGP-Bench provides a valuable tool for assessing the capabilities of LLMs and pushing the boundaries of their visual reasoning abilities. It also suggests that new techniques like SIT could be crucial for enabling LLMs to effectively understand and leverage symbolic graphics programs in a variety of applications.