IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Hierarchical Multimodal Knowledge Matching for Training-Free Open-Vocabulary Object Detection.
Qisen Ma, Yan Huang, Zikun Liu, Hyunhee Park, Liang Wang
Published: 202510.1109/TIP.2025.3618408
Abstract
Open-Vocabulary Object Detection (OVOD) aims to leverage the generalization capabilities of pre-trained vision-language models for detecting objects beyond the trained categories. Existing methods mostly focus on supervised learning strategies based…
Preview only. Read the full abstract at the source