A lightweight infrared remote sensing architecture for enhanced small target detection using improved DETR with CST modules.
Hongyi Duan, Jinyang Niu, Junjie Hao, Pengyue Hao, Jijiang Xu
Abstract
Open AccessInfrared remote sensing (IRS) ship detection faces challenges such as low resolution and environmental interference, with issues being particularly pronounced for small targets. This study proposes a lightweight architecture based on RT-DETR, termed RT-DETR-CST: A Cross-Channel Feature Attention Network (CFAN) is constructed, which achieves channel-weighted feature fusion via residual connections to suppress invalid background channels, addressing the problem of inter-channel information imbalance in infrared images and the suppression of small-target features by background noise. A Scale-Wise Feature Network (SWN) is developed, utilizing depthwise separable convolutions and stochastic depth for multi-scale feature extraction, where stochastic depth enhances the model's robustness to small-target features. A Texture/Detail Capture Network (TCN) is built, achieving edge/detail capture through linear decomposition and low-cost channel fusion to solve the problems of target edge blurring and detail feature loss in infrared images caused by low signal-to-noise ratios. Experiments on the ISDD datasets show that RT-DETR-CST achieves an mAP0.5 metric of 89.4% (a 4.9% improvement over RT-DETR), reduces model size to 23.7 MB (a 41.5% reduction), and achieves an inference speed of 207.2 FPS. Ablation experiments validate the effectiveness of each module, demonstrating the model's superior accuracy, lightweight design, and real-time performance in infrared ship remote sensing small-target detection. Furthermore, the generalization verification on the SSDD and SIRST datasets shows that the proposed model is effective in both infrared and SAR remote sensing small target detection.