AI Papers Reader

Personalized digests of latest AI research

View on GitHub

LLMs for Medical Diagnosis: A Chain of Diagnosis Approach

Large language models (LLMs) are transforming many fields, including healthcare. Their ability to learn complex patterns from vast amounts of text data makes them ideal for tasks like disease diagnosis. However, LLMs often struggle with interpretability, meaning it’s difficult to understand how they arrive at their conclusions. This lack of transparency can be a major obstacle to their adoption in medical settings, where trust and accountability are paramount.

A new study published in arXiv proposes a novel approach called “Chain of Diagnosis” (CoD) that aims to address this challenge. CoD enhances the interpretability of LLM-based medical diagnosis by transforming the diagnostic process into a transparent chain of reasoning, similar to how a physician would think.

Here’s how CoD works:

  1. Symptom Abstraction: The LLM first summarizes the patient’s symptoms into a concise representation, akin to how a doctor would note the patient’s chief complaint.
  2. Disease Recall: The LLM then identifies a set of potential diseases based on the symptoms and its knowledge base, effectively narrowing down the diagnostic possibilities.
  3. Diagnostic Reasoning: The LLM proceeds to analyze each potential disease, carefully considering the patient’s symptoms and the known characteristics of each condition. This reasoning process is presented as a series of steps, making the LLM’s thought process transparent.
  4. Confidence Assessment: To further enhance interpretability, the LLM outputs a confidence distribution, assigning a probability to each potential disease based on the strength of the evidence. This allows users to assess the certainty of the LLM’s diagnosis.
  5. Decision Making: The LLM then makes a decision, either diagnosing a disease or requesting further information from the patient. This decision is based on a pre-defined confidence threshold. If the confidence is below the threshold, the LLM will inquire about a new symptom, aiming to reduce uncertainty and improve the accuracy of the diagnosis.

The researchers used CoD to develop DiagnosisGPT, an LLM capable of diagnosing over 9,604 diseases. Their experiments show that DiagnosisGPT outperforms other LLMs on public diagnostic datasets and a newly created benchmark, achieving an accuracy exceeding 90% in many cases. Furthermore, DiagnosisGPT’s performance improves when symptom inquiries are allowed, demonstrating its ability to learn and adapt its reasoning process based on new information.

CoD’s strength lies in its ability to make LLM-based medical diagnosis more transparent and explainable. By providing a clear and comprehensive reasoning pathway, CoD helps build trust in LLMs for medical applications, paving the way for their broader adoption in healthcare. However, the researchers acknowledge limitations, such as the reliance on synthetic data and the potential for misdiagnosis, highlighting the need for continued research and rigorous evaluation.

The authors believe that CoD offers a novel approach to enhancing the interpretability of LLMs in medical diagnosis. By making diagnostic reasoning more transparent and controllable, CoD represents a significant step towards building trust and confidence in AI-assisted medical practice.