Transformer-based vehicle re-identification with view information.
Mingdong Zhu, Qinghe Feng
Abstract
Open AccessWith the advance of the industrial Internet of Things, automated environments such as smart cities are foreseeable. Vehicle re-identification (ReID) tries to recall images of the same vehicle in large-scale image data sets. The main challenges of vehicle ReID are intra-identity differences caused by different views of the same vehicle and inter-identities similarity caused by the same view of similar vehicles. To deal with these challenges, we propose a novel Transformer-based Vehicle ReID model with View Information (TVRVI). First, we annotate and divide the vehicle images into five different view parts and train a parsing network with view information to separate images and give out their corresponding view labels. Second, we design a dual branches transformer-based parsing network to extract features of different views separately and reduce their entanglement. With the help of view information, the local branch of the transformer network can learn fine-grained representations of different views and the overall feature of the whole image is learned in the global branch at the same time. By comparing features from the same view and leaving out features of different views, the TVRVI makes good use of common-view features and avoids interference from uncommon-view features, which helps reduce intra-identity differences and inter-identities similarities between vehicle images. Experiments on four public vehicle datasets show the effectiveness and state-of-the-art results of our method, and ablation studies are specifically designed to testify to the effectiveness of view information.