AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Reinforcement Learning Boosts Medical Report Data Extraction with Limited Samples

Researchers have developed a novel method to efficiently extract structured information from medical documents, even with a limited number of training examples. The approach, detailed in a recent paper, leverages reinforcement learning to significantly improve the accuracy and completeness of extracting data from medical reports, a crucial task for applications like report analysis and online consultations.

The core innovation lies in the Reinforcement Learning with Verifiable Rewards (RLVR) framework. This method addresses the challenges posed by the domain-specific nature of medical documents and the high cost of annotation. Traditional methods often struggle with the intricate schemas and unique terminology found in medical reports. This new approach, however, can achieve state-of-the-art performance using as few as 100 annotated samples.

A key aspect of the RLVR framework is its sophisticated reward mechanism. It balances precision, to minimize fabricated or incorrect information (hallucinations), with recall, to ensure all essential data fields are captured. For instance, when extracting information from a patient’s medical report, a model trained with this method is incentivized not only to correctly identify values like “blood pressure” or “cholesterol level” but also to extract all relevant indicators present in the document. This is achieved through a carefully designed reward system that evaluates both the accuracy of individual data points and the overall coverage of the required information.

The researchers fine-tuned a large multimodal model, Qwen2.5-VL-7B, using their RLVR method. The results demonstrate a substantial improvement in key metrics such as F1 score, precision, and recall on medical Visual Information Extraction (VIE) tasks. For example, when processing a pathology report, the RLVR-enhanced model was significantly better at extracting specific details like “tumor stage” or “biopsy results” and their corresponding values compared to baseline models.

Furthermore, the study highlights the importance of “thinking” or reasoning capabilities within these models. By comparing models that were trained to first generate an internal thought process before producing the final JSON output against those that did not, the researchers found that the “thinking” models exhibited superior performance. This is akin to how a human expert might first analyze an image of a medical scan, identify key regions, and then systematically extract the relevant data. The research shows that this explicit reasoning step helps the model better understand the image structure and extract information with higher precision.

The paper also explored different sampling strategies for training. One strategy involved randomly sampling key-value pairs from the ground truth JSON to create diverse queries, while another used all key-value pairs. The former led to shorter response lengths and faster reward growth during training, suggesting a more efficient learning process.

While the RLVR method shows impressive results on medical documents, the study also notes that its performance can drop on general VIE tasks that are dissimilar to the medical domain. This underscores the need for continued domain-specific optimization in VIE systems. Nevertheless, the research provides a compelling demonstration of how reinforcement learning, combined with smart data sampling and reasoning mechanisms, can unlock powerful capabilities in extracting structured information from complex visual documents, even with limited training data.