2025-08-22
AI for Software Development
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Relevance: This paper presents MeshCoder, an LLM-powered framework that translates 3D point clouds into editable Blender Python scripts. This directly aligns with the ‘code completion and generation’ aspect of AI for Software Development, as it allows for programmatic reconstruction of 3D objects. The ability to perform ‘intuitive geometric and topological editing through convenient code modifications’ highlights its practical utility for developers, enhancing their workflow in 3D design tasks.
💡 Summary 📄 Full paper
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Relevance: This paper introduces Agent Foundation Models (AFMs) that can perform complex problem-solving, including in ‘code agent settings.’ This directly relates to AI assisting software developers by demonstrating how AI agents can autonomously handle intricate coding tasks and interact with digital environments. The concept of dynamically activating ‘tool agents’ makes these AFMs potential advanced assistants for various software development workflows beyond mere code generation.
💡 Summary 📄 Full paper
AI Agents
Virtuous Machines: Towards Artificial General Science
Relevance: This paper demonstrates an ‘agentic AI system’ capable of autonomously navigating the entire scientific workflow, from hypothesis generation to data collection and manuscript preparation. This exemplifies advanced AI agent capabilities, showing systems that ‘perceive their environment, reason about tasks, plan actions, and execute them using available tools,’ fulfilling the definition of an AI agent, particularly in a complex, real-world domain like scientific discovery.
💡 Summary 📄 Full paper
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery
Relevance: This survey paper directly defines and contextualizes ‘Agentic Science,’ where AI systems evolve into ‘autonomous research partners’ with ‘full scientific agency.’ It unifies perspectives on foundational capabilities, core processes, and domain-specific realizations, providing a comprehensive overview of how AI systems are progressing towards autonomous, agentic behaviors, including hypothesis generation, experimental design, and iterative refinement, aligning perfectly with the research topic.
💡 Summary 📄 Full paper
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Relevance: This work introduces the ‘Chain-of-Agents’ paradigm, enabling LLMs to simulate ‘multi-agent collaboration’ by dynamically activating ‘tool agents’ and ‘role-playing agents.’ This directly addresses the creation of autonomous software systems that can ‘break complex tasks into manageable steps’ and ‘interact with digital tools.’ It advances the field by developing ‘Agent Foundation Models’ (AFMs) with enhanced problem-solving capabilities in agentic settings.
💡 Summary 📄 Full paper
LLM Evaluation Methods
From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models
Relevance: This paper proposes FinCDM, a novel cognitive diagnosis framework for financial LLMs that moves beyond single scores to ‘knowledge-skill level’ evaluation. By identifying ‘what financial skills and knowledge they have or lack,’ it offers a more interpretable and ‘trustworthy’ assessment. This aligns strongly with HCI’s emphasis on understanding model limitations, ensuring fairness, and facilitating targeted model development based on fine-grained diagnostic insights.
💡 Summary 📄 Full paper
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
Relevance: MMAU-Pro is presented as a comprehensive benchmark for evaluating AI systems’ ‘audio general intelligence’ across ‘49 unique skills’ and ‘multiple complex dimensions,’ including multi-hop reasoning. This directly supports the LLM evaluation goal of benchmarking capabilities and limitations across diverse tasks. The focus on ‘actionable perspectives’ derived from revealing ‘significant limitations’ is crucial for guiding future model improvements from an HCI perspective.
💡 Summary 📄 Full paper
Beyond Human Judgment: A Bayesian Evaluation of LLMs’ Moral Values Understanding
Relevance: This paper introduces a large-scale Bayesian evaluation framework to assess LLMs’ ‘moral values understanding’ by modeling human annotator disagreements. This method goes ‘beyond human judgment’ to capture uncertainty, offering a nuanced approach to ethical and bias evaluation. From an HCI perspective, understanding how LLMs interpret moral dimensions and identifying their ‘sensitive moral detection capabilities’ or potential pitfalls is critical for ensuring alignment and trustworthiness.
💡 Summary 📄 Full paper
Reinforcement Learning
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Relevance: This work leverages ‘agentic reinforcement learning’ to enable LLMs to perform ‘end-to-end complex problem-solving’ by dynamically activating tool and role-playing agents. This demonstrates RL’s application in enabling autonomous systems to ‘learn optimal behaviors by interacting with environments.’ For HCI, understanding how agents learn through RL and how this impacts their collaboration or independent work is a key area of study.
💡 Summary 📄 Full paper
CAMAR: Continuous Actions Multi-Agent Routing
Relevance: CAMAR introduces a new ‘Multi-agent reinforcement learning (MARL)’ benchmark focusing on ‘continuous state and action spaces’ and ‘challenging coordination and planning tasks.’ This directly advances RL research in multi-agent settings, which is essential for studying interactions and cooperation among agents. From an HCI perspective, this provides a testbed for designing environments and interfaces that facilitate intuitive human-agent collaboration and coordination.
💡 Summary 📄 Full paper
Reinforcement Learning with Rubric Anchors
Relevance: This paper extends Reinforcement Learning from Verifiable Rewards (RLVR) to open-ended tasks by incorporating ‘rubric-based rewards.’ By using structured, ‘model-interpretable criteria for automatic scoring of subjective outputs,’ it enables RL to align LLMs with nuanced human preferences and stylistic control. This is highly relevant to HCI, as it explores how humans can ‘effectively guide agent learning’ and tailor agent outputs to be ‘human-like’ and expressive.
💡 Summary 📄 Full paper
Explainable AI
CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection
Relevance: CorrSteer focuses on extracting ‘interpretable features’ from LLMs using Sparse Autoencoders to enable downstream steering, bias mitigation, and reasoning. This directly contributes to Explainable AI by making LLMs more transparent. The ability to identify ‘semantically meaningful patterns’ aligned with tasks reveals the ‘underlying capabilities that drive performance,’ helping users and developers understand why an LLM behaves in a certain way and make its decision-making processes more interpretable.
💡 Summary 📄 Full paper
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
Relevance: This paper introduces ‘Atomic Thought’ and ‘Atomic Thought Rewards’ to guide LLM agents, leading to ‘more interpretable, human-like reasoning patterns.’ By decomposing reasoning into ‘fine-grained functional units’ and providing ‘supervision anchors,’ Atom-Searcher makes the internal workings of complex AI agents more transparent. This directly supports XAI’s goal of enabling users to understand how AI agents make decisions and learn from their experience, fostering trust.
💡 Summary 📄 Full paper
Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods
Relevance: While focused on machine unlearning, this paper presents a ‘visual analytics system’ designed to help researchers ‘analyze and understand the behavior’ of different MU methods. By allowing comparison at ‘class-, instance-, and layer-levels,’ it offers crucial interpretability into how models change after unlearning. This system acts as an XAI tool, making complex model behaviors transparent to users and researchers, informing decisions about trust and data privacy.
💡 Summary 📄 Full paper