Learning Hierarchically Consistent Disentanglement with Multi-Channel Augmentation for Public Security-Oriented Sketch Person Re-Identification.
Yu Ye, Zhihong Sun, Jun Chen
Abstract
Open AccessSketch re-identification (Re-ID) aims to retrieve pedestrian photographs in the gallery dataset by a query sketch image drawn by professionals, which is crucial for criminal investigations and missing person searches in the field of public security. The main challenge of this task lies in bridging the significant modality gap between sketches and photos while extracting discriminative modality-invariant features. However, information asymmetry between sketches and RGB photographs, particularly the differences in color information, severely interferes with cross-modal matching processes. To address this challenge, we propose a novel network architecture that integrates multi-channel augmentation with hierarchically consistent disentanglement learning. Specifically, a multi-channel augmentation module is developed to mitigate the interference of color bias in cross-modal matching. Furthermore, a modality-disentangled prototype(MDP) module is introduced to decompose pedestrian representations at the feature level into modality-invariant structural prototypes and modality-specific appearance prototypes. Additionally, a cross-layer decoupling consistency constraint is designed to ensure the semantic coherence of disentangled prototypes across different network layers and to improve the stability of the whole decoupling process. Extensive experimental results on two public datasets demonstrate the superiority of our proposed approach over state-of-the-art methods.