• HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
Wednesday, May 13, 2026
BIOENGINEER.ORG
No Result
View All Result
  • Login
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
No Result
View All Result
Bioengineer.org
No Result
View All Result
Home NEWS Science News Biology

Evaluating AI Anatomy Segmentation Models Without Ground Truth Data

Bioengineer by Bioengineer
May 13, 2026
in Biology
Reading Time: 4 mins read
0
Evaluating AI Anatomy Segmentation Models Without Ground Truth Data — Biology
Share on FacebookShare on TwitterShare on LinkedinShare on RedditShare on Telegram

In the rapidly advancing field of medical imaging, the advent of artificial intelligence (AI) has revolutionized how vast collections of scans are analyzed. Automated anatomy segmentation—where AI models label organs and structures in images such as chest CT scans—has become a cornerstone for enabling large-scale studies previously infeasible due to the need for painstaking manual annotation. However, as the number of segmentation models multiplied, researchers have grappled with a fundamental challenge: how to objectively compare these AI tools in the absence of expert-verified ground truth.

A recent study published in the Journal of Medical Imaging has shed new light on this problem, proposing a robust and practical framework to evaluate concordance among different AI-based anatomy segmentation models without relying on expert annotations as a gold standard. This work centers on chest CT images sourced from the National Lung Screening Trial (NLST), a widely used public dataset for cancer research, ensuring high relevance and applicability to deployed clinical and research scenarios.

The dilemma stems from the nature of public datasets like NLST, which, despite containing thousands of imaging volumes, lack comprehensive organ and bone segmentations. Manual annotations for such intricate structures are astronomically time-consuming and require highly skilled radiologists, rendering complete ground truth labeling impractical. AI models can generate these labels automatically, yet disparity arises because each model may use different terminology, boundary definitions, or anatomical inclusion criteria. Without a consensus or external standard, pinpointing the superior model has remained a vexing conundrum.

Addressing this, the investigators embraced a paradigm shift: they evaluated AI segmentation tools based on their agreement rather than absolute accuracy. The hypothesis is elegant—if independently developed models concur in labeling a structure, that concordance likely indicates a reliable and valid anatomical segmentation. Rather than seeking the elusive “correct” answer, the study quantifies where AI tools align and where they diverge on the same dataset.

Achieving direct model comparison necessitated a standardized baseline. The researchers selected six prominent open-source segmentation models, including TotalSegmentator (two versions), Auto3DSeg, MOOSE, MultiTalent, and CADS. Despite their differing original output formats and nomenclature, the team harmonized all results by converting them into an interoperable DICOM segmentation standard. Furthermore, they unified labels using the SNOMED-CT vocabulary—a widely accepted medical ontology—assigning uniform color codes and identifiers to anatomical regions. This harmonization enabled side-by-side visualization of segmentations from different models on the very same scan, facilitating accurate comparison.

To enhance accessibility, the study leveraged two powerful open-source platforms widely embraced in medical imaging research: OHIF Viewer, a browser-based tool, and 3D Slicer, a robust desktop application. They extended these viewers with bespoke integrations and plugins capable of displaying multiple segmentations simultaneously in three-dimensional and orthogonal two-dimensional views. This user-friendly interface allows researchers to interactively explore congruence and discrepancies among models for individual organs and structures with unprecedented ease.

The analytic phase focused on a carefully curated subset of 18 chest CT scans from different NLST participants. After filtering out partially imaged or inconsistently detected anatomical structures, the study concentrated on 24 key regions, including lung lobes, the heart, ribs, thoracic vertebrae, and the sternum. For each structure, the authors identified a “consensus” segmentation defined as the voxel set concurrently labeled by all models recognizing that anatomical part. Subsequent comparisons measured how each model’s output overlapped with this consensus region, employing metrics quantified shape similarity and volumetric congruence.

These quantitative results were further distilled into interactive plots enabling rapid identification of outlier models or scans exhibiting problematic segmentations. Notably, the team released a publicly accessible interactive website to disseminate these findings, inviting the broader research community to examine the detailed concordance metrics and underlying imaging data themselves, fostering transparency and collaborative refinement.

Results illuminated variable performance across structures. Lung segmentation demonstrated remarkable agreement, with high overlap and nearly indistinguishable boundaries across all models. This consistency highlights the maturity of lung segmentation technologies—likely a function of abundant training data and well-defined anatomical landmarks. In contrast, heart segmentations initially showed moderate concordance owing primarily to one outlier model adopting a narrower definition of the heart. Excluding this model markedly improved overall alignment among the remainder.

Bone structures revealed greater challenges. Four of the six models manifested frequent errors in rib and thoracic vertebrae labels, including merges of adjacent bones or misidentification of vertebral levels. Conversely, two models trained on distinct datasets produced notably more consistent and anatomically comprehensive segmentations. These subtleties eluded aggregate statistics but emerged clearly through simultaneous visual scrutiny, underscoring the indispensability of combined quantitative and qualitative evaluation techniques.

This investigation underscores a crucial insight: even highly cited AI segmentation models can harbor systematic weaknesses, particularly when trained on overlapping or limited data. It also validates a novel pathway for meaningful model assessment without the prohibitive cost of manual ground truth annotation. By integrating standardized atlases, ontology-driven label harmonization, automated voxelwise comparison, and interactive visualization, this framework provides a reproducible, scalable solution for evaluating medical imaging AI tools.

Beyond its immediate findings, this work promotes a vital cultural shift in biomedical AI research—from chasing a mythical single “best” model to embracing evidence-based decision-making informed by comparative strengths and weaknesses. The open availability of software, label mappings, and sample datasets offers the community an invaluable toolkit applicable not only to chest CT anatomy but extensible to other modalities and segmentation tasks.

As AI becomes integral to clinical workflows and population-scale studies alike, transparent evaluation frameworks like this will be indispensable. They empower data scientists, clinicians, and researchers to select segmentation models thoughtfully, gauge reliability, and appreciate limitations—ultimately enhancing the trustworthiness and impact of AI in healthcare.

In a landscape increasingly reliant on AI-generated annotations, the study by L. Giebeler et al. pioneers a path that balances rigor with practicality. Their approach bridges methodological divides, nurtures collaboration, and elevates the standard of medical image analysis through collective truth-seeking, even when classical ground truths remain elusive.

Subject of Research: Not applicable
Article Title: In search of truth: evaluating concordance of AI-based anatomy segmentation models
News Publication Date: 3-Apr-2026
Web References:

https://www.spiedigitallibrary.org/journals/journal-of-medical-imaging/volume-13/issue-06/062204/In-search-of-truth–evaluating-concordance-of-AI-based/10.1117/1.JMI.13.6.062204.full
http://dx.doi.org/10.1117/1.JMI.13.6.062204
References:
Giebeler L., et al., “In search of truth: evaluating concordance of AI-based anatomy segmentation models,” Journal of Medical Imaging, 13(6), 062204 (2026).
Image Credits: L. Giebeler et al.
Keywords: Artificial intelligence, Medical imaging, Anatomy, Anatomy segmentation, AI evaluation, Chest CT, National Lung Screening Trial, Open-source models, DICOM segmentation, SNOMED-CT, 3D Slicer, OHIF Viewer

Tags: AI anatomy segmentation evaluationAI in radiologyAI model concordance assessmentautomated organ labelingchest CT scan segmentationcomparison of AI segmentation toolslarge-scale medical image analysismedical imaging AI modelsNational Lung Screening Trial datasetpublic medical imaging datasetsrobust evaluation framework for AIsegmentation without ground truth

Share12Tweet7Share2ShareShareShare1

Related Posts

Study Finds Genetic Risk for Schizophrenia Emerges in Early Adolescence — Biology

Study Finds Genetic Risk for Schizophrenia Emerges in Early Adolescence

May 13, 2026
How Water Fleas Sense Their Predators: A Scientific Insight — Biology

How Water Fleas Sense Their Predators: A Scientific Insight

May 13, 2026

Cellular ‘All-Clear’ Signal Triggers Resumption of Protein Synthesis

May 13, 2026

Hidden Giant Viruses Infect and Inherit in Algae

May 13, 2026

POPULAR NEWS

  • Research Indicates Potential Connection Between Prenatal Medication Exposure and Elevated Autism Risk

    842 shares
    Share 337 Tweet 211
  • New Study Reveals Plants Can Detect the Sound of Rain

    729 shares
    Share 291 Tweet 182
  • Salmonella Haem Blocks Macrophages, Boosts Infection

    62 shares
    Share 25 Tweet 16
  • Breastmilk Balances E. coli and Beneficial Bacteria in Infant Gut Microbiomes

    57 shares
    Share 23 Tweet 14

About

We bring you the latest biotechnology news from best research centers and universities around the world. Check our website.

Follow us

Recent News

Rice Scientists Create Realistic ‘Mock’ Samples to Accelerate Cervical Cancer Test Development

Blood Pressure Medication Shown to Reduce Arterial Stiffness

Adaptive Evolution Shapes Hyperdiverse Cichlid Intestines

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 82 other subscribers
  • Contact Us

Bioengineer.org © Copyright 2023 All Rights Reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Homepages
    • Home Page 1
    • Home Page 2
  • News
  • National
  • Business
  • Health
  • Lifestyle
  • Science

Bioengineer.org © Copyright 2023 All Rights Reserved.