13-May-2026
Comparing AI anatomy segmentation models when ground truth is missing
SPIE--International Society for Optics and PhotonicsPeer-Reviewed Publication
Artificial intelligence models that automatically label organs and anatomical structures in medical images are increasingly used to study large public imaging datasets, but comparing these tools is difficult when no expert reference annotations exist. A recent study developed an open-source framework for evaluating and comparing AI-based anatomy segmentation models using inter-model agreement rather than ground truth. Applying the approach to six widely used segmentation models and chest CT scans from the National Lung Screening Trial, the researchers normalized all model outputs to follow standard lexicon, and stored the result in a common medical imaging format. Normalized representation made it possible to visualize and compare model outputs, and analyze how consistently each model segmented key structures such as the lungs, heart, ribs, vertebrae, and sternum. The results show strong agreement across models for lung segmentation, but substantial inconsistencies and systematic errors for bones and some cardiac structures, particularly among models trained on similar data. By combining quantitative agreement metrics with interactive visualization tools, the framework helps identify reliable models and flag problem cases, offering a practical way to guide model selection and large-scale reuse of AI-generated annotations.
- Journal
- Journal of Medical Imaging