Diagnostics (Basel, Switzerland)

Reproducibility of AI in Cephalometric Landmark Detection: A Preliminary Study.

David Emilio Fracchia, Denis Bignotti, Stefano Lai, Stefano Cubeddu, Fabio Curreli, Massimiliano Lombardo, Alessio Verdecchia, Enrico Spinas

Published: 202510.3390/diagnostics15192521

Abstract

Open Access

Objectives: This study aimed to evaluate the reproducibility of artificial intelligence (AI) in identifying cephalometric landmarks, comparing its performance with manual tracing by an experienced orthodontist. Methods: A high-quality lateral cephalogram of a 26-year-old female patient, meeting strict inclusion criteria, was selected. Eighteen cephalometric landmarks were identified using the WebCeph software (version 1500) in three experimental settings: AI tracing without image modification (AInocut), AI tracing with image modification (AI-cut), and manual tracing by an orthodontic expert. Each evaluator repeated the procedure 10 times on the same image. X and Y coordinates were recorded, and reproducibility was assessed using the coefficient of variation (CV) and centroid distance analysis. Statistical comparisons were performed using one-way ANOVA and Bonferroni post hoc tests, with significance set at p < 0.05. Results: AInocut achieved the highest reproducibility, showing the lowest mean CV values. Both AI methods demonstrated greater consistency than manual tracing, particularly for landmarks such as Menton (Me) and Pogonion (Pog). Gonion (Go) showed the highest variability across all groups. Significant differences were found for the Posterior Nasal Spine (PNS) point (p = 0.001), where AI outperformed manual tracing. Variability was generally higher along the X-axis than the Y-axis. Conclusions: AI demonstrated superior reproducibility in cephalometric landmark identification compared to manual tracing by an experienced operator. While certain points showed high consistency, others-particularly PNS and Go-remained challenging. These findings support AI as a reliable adjunct in digital cephalometry, although the use of a single radiograph limits generalizability. Broader, multi-image studies are needed to confirm clinical applicability.