International journal of computer assisted radiology and surgery

Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT surgery.

Richard Bieck, Martin Sorge, Katharina Heuermann, Viktor Kunz, Markus Pirlich, Thomas Neumuth

Published: 202510.1007/s11548-025-03512-z

Abstract

PURPOSE: Deep learning models for endoscopic assistance applications predominantly focus on image-based tasks, such as tool detection, anatomical classification, and workflow segmentation. However, these approaches often neglect the integration of na…

Preview only. Read the full abstract at the source

View at DOI