3D Arena: Bridging the Gap in Generative 3D Evaluation
The rapid advancement in generative 3D technology has outpaced the development of reliable evaluation methods. Traditional benchmarks often rely on image-based metrics that fail to capture the nuances of 3D structure or geometric measures that miss perceptual appeal and real-world utility. To address this critical gap, researchers have introduced 3D Arena, an open platform designed for large-scale human preference evaluation of image-to-3D generation models.
Since its launch in June 2024, 3D Arena has emerged as a significant resource, collecting over 123,000 votes from more than 8,000 users across 19 state-of-the-art models. This vast dataset allows for robust statistical analysis, establishing it as the largest human preference evaluation for Generative 3D to date. The platform employs a simple yet effective pairwise comparison system, where users are presented with two anonymized 3D outputs side-by-side and asked to choose which they prefer. This approach, inspired by similar successful platforms in other AI domains, allows for the direct assessment of subjective qualities like visual appeal, structural coherence, and perceived usefulness.
A key contribution of the paper is the analysis of the collected data, which has revealed interesting patterns in human preferences. For instance, the results indicate a clear preference for Gaussian splat outputs over traditional mesh formats, with splats holding a 16.6 ELO point advantage. Similarly, textured 3D models were favored over untextured ones, demonstrating a substantial 144.1 ELO advantage. This suggests that immediate visual presentation features play a significant role in user perception, even when technical trade-offs exist. For example, the paper highlights a controlled comparison where an identical underlying model produced a 78 ELO point advantage for its splat version over its mesh counterpart, solely due to differences in rendering that enhance visual appeal.
The platform also prioritizes data integrity through user authentication and statistical fraud detection, ensuring that 99.75% of user votes are authentic. The use of an ELO rating system, commonly employed in competitive gaming, allows for reliable ranking of models based on these pairwise comparisons.
Beyond providing a benchmark, 3D Arena aims to guide the development of better evaluation methodologies. The findings suggest a disconnect between what professional workflows prioritize (e.g., clean mesh topology for animation) and what casual users tend to favor (immediate visual impact). To bridge this gap, the paper recommends exploring multi-criteria assessment, task-oriented evaluations, and format-aware comparisons. For example, future evaluations could incorporate separate scores for visual appeal and structural quality, or present users with wireframe views to specifically assess topology.
By offering an open platform and a rich dataset, 3D Arena facilitates reproducibility and encourages further research in the critical area of generative 3D evaluation. Its community-driven approach and robust data collection position it as a valuable benchmark for the field, driving progress towards AI models that are not only technically sound but also perceptually appealing and practically useful.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.