Researchers Create Massive Dataset to Advance Interactive Medical Image Segmentation
Shanghai, China – A team of researchers from Shanghai AI Laboratory, Sichuan University, and other institutions have released IMed-361M, a massive new benchmark dataset for interactive medical image segmentation (IMIS), along with a high-performing baseline model. Their work, published on arXiv, addresses a critical bottleneck in the field: the lack of large, diverse, and densely annotated datasets needed to train robust and generalizable IMIS models.
Interactive medical image segmentation allows doctors to guide a computer model’s segmentation process by providing input such as clicks, bounding boxes, or even text descriptions. This user-in-the-loop approach offers greater accuracy and flexibility than fully automated methods, particularly when dealing with unseen object classes or complex medical images. However, progress has been hampered by the limited availability of high-quality training data.
Existing public datasets, such as COSMOS, often contain only a few annotations per image, limiting the ability of models to learn comprehensive object representations and handle diverse interaction strategies. IMed-361M dramatically changes this landscape.
A Dataset of Unprecedented Scale
IMed-361M comprises 6.4 million medical images sourced from over 110 public and private datasets, encompassing 14 modalities (including CT, MRI, X-ray, ultrasound, and more) and 204 segmentation targets (organs, lesions, etc.). The dataset’s most striking feature is its density: it boasts an average of 56 masks (annotations) per image, totaling 361 million masks. This density is crucial because it allows the model to learn fine-grained details from numerous examples of different interaction types.
The researchers leveraged the Segment Anything Model (SAM), a powerful foundation model for general image segmentation, to automate much of the annotation process. They carefully refined the automatically generated masks with a rigorous quality control pipeline and manual curation to ensure high accuracy and consistency.
A High-Performing Baseline Model
To demonstrate the dataset’s potential, the researchers developed IMIS-Net, a baseline IMIS model. IMIS-Net incorporates the strengths of SAM but goes further by fine-tuning it specifically on IMed-361M’s medical data. The model integrates various interaction types (clicks, bounding boxes, text prompts) seamlessly, further enhancing its flexibility.
Evaluations against existing state-of-the-art IMIS models revealed superior performance across various metrics. For example, IMIS-Net consistently outperformed other models in both image-level and mask-level Dice scores, particularly in scenarios involving modalities not well-represented in existing datasets. The researchers also highlight how their IMIS-Net demonstrates remarkable generalization, performing well even on unseen data, illustrating the value of the diverse and densely-annotated nature of the IMed-361M dataset.
Implications and Future Directions
The release of IMed-361M and IMIS-Net represents a significant step forward for the field of interactive medical image segmentation. The availability of such a large-scale, diverse, and densely-masked dataset is expected to spur innovation and accelerate the development of more accurate, robust, and clinically useful IMIS models.
The authors make the IMed-361M dataset and IMIS-Net model publicly available, encouraging further research and development within the community. Future work might explore even more sophisticated interaction paradigms, further refining the quality control of the annotations, and improving model performance on challenging cases. The team hopes their contribution helps pave the way for wider adoption of IMIS technologies in clinical practice.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.