International journal of computer assisted radiology and surgery
Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT surgery.
Richard Bieck, Martin Sorge, Katharina Heuermann, Viktor Kunz, Markus Pirlich, Thomas Neumuth
Published: 202510.1007/s11548-025-03512-z
Abstract
PURPOSE: Deep learning models for endoscopic assistance applications predominantly focus on image-based tasks, such as tool detection, anatomical classification, and workflow segmentation. However, these approaches often neglect the integration of na…
Preview only. Read the full abstract at the source