AI Papers Reader

Personalized digests of latest AI research

View on GitHub

VisCoder: A New AI Model Excels at Generating Visualization Code

Large language models (LLMs) are making impressive strides in code generation. However, creating code that visualizes data, such as charts and diagrams, has remained a significant challenge. This is because it requires not only syntactically correct code but also a visual output that accurately reflects the data and instructions. A new paper introduces VisCoder, a fine-tuned LLM designed to specifically tackle this problem.

VisCoder is trained on VisCode-200K, a massive dataset of over 200,000 examples of Python visualization code. These examples come from two main sources. The first source consists of verified plotting code from open-source repositories, complete with natural language instructions and rendered plots. This means the model learns what the code should do (instruction) and how it should visually appear (rendered plot). The second source contains dialogues where models revise incorrect code based on runtime feedback. This unique addition trains the model to debug its visualization code.

Imagine you want to create a bar chart showing sales figures for different regions. VisCoder can take your instruction: β€œCreate a bar chart showing sales by region, with regions on the x-axis and sales on the y-axis. Make the bars blue.” The model then generates the corresponding Python code using libraries like Matplotlib or Seaborn. Critically, VisCoder is also trained to recognize common errors. So, if it generates code that produces a blank chart due to incorrect syntax, it can iteratively revise its code based on the error message, eventually producing the correct chart.

The researchers evaluated VisCoder on PandasPlotBench, a standard benchmark for visualization code generation. They found that VisCoder significantly outperformed other open-source models in generating executable and accurate visualization code. The best performing version, VisCoder-7B, even rivaled the performance of proprietary models like GPT-4o-mini, especially after incorporating its self-debugging capabilities.

A key innovation in the evaluation was a β€œself-debug evaluation mode.” Here, if the initial code failed, the model was given multiple attempts to revise its output based on execution traces (error messages). This mimics a real-world developer workflow, where programmers iteratively fix errors until the code runs correctly. The results showed that VisCoder could achieve over 90% execution pass rates on Matplotlib and Seaborn in this self-debugging mode.

The paper highlights the challenges of working with different visualization libraries. For example, generating plots with the Plotly library proved to be harder due to its more complex syntax and API. However, VisCoder consistently outperformed baseline models across all libraries tested.

These findings suggest that fine-tuning LLMs on a high-quality, domain-specific dataset with execution-grounded feedback can significantly improve their ability to generate accurate and executable visualization code. This advance could empower data scientists and analysts to quickly create effective visualizations, even if they lack deep programming expertise.