News Release 3-Jan-2025

Northwestern Polytechnical University team: Potential of multimodal large language models for data mining of medical images and free-text reports

Peer-Reviewed Publication

KeAi Communications Co., Ltd.

**image:**
**SCHEMATIC OVERVIEW OF THE EVALUATION TASKS AND METHODS**
view more

Credit: YUTONG ZHANG, et al

In recent years, the advancement of multimodal large language models (MLLMs) has increasingly demonstrated their potential in medical data mining. However, the diversity and heterogeneity nature of medical images and radiology reports can pose significant challenges to the universality of data mining methods.

To address these challenges, a team led by Dr. Xin Zhang from the Institute of Medical Research, Northwestern Polytechnical University in Xi’an, China, systematically evaluated the performance of Gemini and GPT-series models across various medical tasks.

“Our study encompasses 14 diverse medical datasets, spanning dermatology, radiology, dentistry, ophthalmology and endoscopy image categories, as well as radiology report datasets,” shares Zhang. “The tasks evaluated include disease classification, lesion segmentation, anatomical localization, disease diagnosis and report generation.”

The results reveal that the Gemini series excels in report generation and lesion detection, while the GPT series demonstrates strengths in lesion segmentation and anatomical localization.

“The study highlights the promise of these multimodal models in alleviating the burden on clinicians and fostering the integration of AI into clinical practice, potentially mitigating healthcare resource constraints,” adds Zhang. “Nonetheless, further optimization and rigorous validation are required before clinical deployment.

The team published their findings in the KeAi journal Meta-Radiology.

By establishing benchmarks for the performance of multimodal AI systems, the team’s efforts provide a foundation for the continued development and application of such technologies, as well as future research on the multimodal integration of medical imaging and textual analysis.

###

Contact the author: Xin Zhang, Institute of Medical Research, Northwestern Polytechnical University, xzhang@nwpu.edu.cn

The publisher KeAi was established by Elsevier and China Science Publishing & Media Ltd to unfold quality research globally. In 2013, our focus shifted to open access publishing. We now proudly publish more than 200 world-class, open access, English language journals, spanning all scientific disciplines. Many of these are titles we publish in partnership with prestigious societies and academic institutions, such as the National Natural Science Foundation of China (NSFC).

Journal

Meta-Radiology

DOI

10.1016/j.metrad.2024.100103

Method of Research

Systematic review

Subject of Research

People

Article Title

Potential of multimodal large language models for data mining of medical images and free-text reports.

COI Statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.