Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT surgery. — SciRadar