IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Hierarchical Multimodal Knowledge Matching for Training-Free Open-Vocabulary Object Detection.

Qisen Ma, Yan Huang, Zikun Liu, Hyunhee Park, Liang Wang

Published: 202510.1109/TIP.2025.3618408

Abstract

Open-Vocabulary Object Detection (OVOD) aims to leverage the generalization capabilities of pre-trained vision-language models for detecting objects beyond the trained categories. Existing methods mostly focus on supervised learning strategies based…

Preview only. Read the full abstract at the source

View at DOI