AI Papers Reader

Personalized digests of latest AI research

View on GitHub

LLM-Powered Autonomous Vehicles Learn to Better Interact with Humans

Autonomous vehicles (AVs) are increasingly common, but their interactions with human-driven vehicles (HVs) remain a challenge. A new paper, “Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions,” proposes a novel system that leverages large language models (LLMs) to significantly improve AV-HV interactions. The system addresses three key challenges: real-time adaptability and interpretability of AV decisions, the heterogeneity and unpredictability of human drivers, and the complexity and diversity of driving scenarios.

The core of the system is a parallel Actor-Reasoner framework. The Reasoner uses an LLM to interpret the driving intentions and styles of HVs through natural language processing, enhanced by the “Chain-of-Thought” (CoT) prompting technique which helps extract logical reasoning. This approach is particularly relevant because LLMs can understand the nuances of human communication, potentially bridging the gap in understanding between humans and AVs which existing methods often fail to do. For example, if an HV indicates through language that it intends to yield, the Reasoner would be able to interpret this, and use this information for decision-making.

The Actor, on the other hand, is a high-speed memory-based system. During training, the Reasoner interacts with simulated HVs exhibiting diverse driving styles, creating a rich database of interactions. This database is partitioned based on the inferred driving styles of the simulated HVs. This partition allows for fast retrieval of relevant past experiences when interacting with actual HVs. For example, if the Reasoner identifies an HV as having a conservative driving style, the Actor can quickly access relevant past scenarios involving similarly conservative drivers, enabling a quicker and more contextually appropriate response.

The system employs a two-layer memory retrieval system to increase the efficiency of the Actor. The first layer filters scenarios based on the overall driving situation. The second layer then refines the selection by comparing the textual descriptions of experiences, identifying the most similar past scenario and suggesting an appropriate action.

The integration of an external human-machine interface (eHMI) further enhances communication. The Reasoner generates eHMI messages, such as textual displays on the AV, to explicitly communicate the AV’s intentions to the HV. This is critical for building trust and avoiding misunderstandings. Imagine a scenario where the AV is turning left and encounters an oncoming vehicle. The Reasoner, after assessing the situation, might generate an eHMI message that clearly states the AV’s intent to turn, reducing the likelihood of a dangerous near miss.

The researchers validated their framework through extensive simulations and real-world field tests. Ablation studies demonstrated the effectiveness of each component: the memory partition significantly improved efficiency, the two-layer memory retrieval increased the accuracy of action selection, and including HV instructions enhanced the system’s overall performance. Real-world tests in various scenarios, including intersections, roundabouts, and merging areas, corroborated these results. The use of LLMs allowed for adaptive and nuanced responses that outperformed other state-of-the-art methods, especially in complex scenarios involving multiple vehicles.

This research demonstrates significant progress in enhancing AV-HV interaction. The Actor-Reasoner framework, powered by LLMs and enhanced by a thoughtful integration of memory and eHMI, offers a promising approach to making autonomous driving safer and more efficient. Future work will focus on further refining the eHMI communication and extending the system’s capabilities to collaborative driving scenarios.