Enhanced small object detection in UAV aerial imagery through attention gated backbone and context aware fusion.
WenFeng Li, YuCheng Zhang
Abstract
Open AccessDetecting small objects in unmanned aerial vehicle (UAV) imagery remains a formidable challenge attributed to sparse pixel representation, intricate background compositions, and stringent computational limitations on edge devices. Conventional detection methodologies encounter substantial difficulties in establishing adequate feature representations and integrating contextual information when processing targets typically occupying fewer than [Formula: see text] pixels. AGEC-DETR, an enhanced RT-DETR-based framework specifically engineered for UAV small object detection through three novel architectural components, is introduced. The Attention-Gated Enhanced Backbone (AGEB) incorporates single-head self-attention mechanisms alongside convolutional gated linear units to strengthen feature extraction while capturing both local and global contextual dependencies. The Efficient Small Object Pyramid maintains and amplifies small object characteristics through SPDConv and CSP-MSFE structures, effectively mitigating feature dilution prevalent in conventional pyramid architectures. The Context-Aware Fusion Module enables adaptive multi-scale feature integration via context-aware mechanisms, substantially enhancing target-background discrimination capabilities. Extensive validation on the VisDrone 2019 dataset reveals that the proposed approach achieves 23.1% AP and 18.8% [Formula: see text], demonstrating improvements of 2.3% and 3.1% respectively compared to baseline RT-DETR-R18. When contrasted with RT-DETR-R50, the method preserves superior accuracy while reducing parameters and computational overhead by 65.5% and 51.3% respectively. Cross-dataset evaluation on DOTA and UAVDT datasets validates the method's robust generalization capabilities, establishing its suitability for deployment on resource-constrained UAV platforms across diverse applications including urban surveillance, traffic monitoring, and smart city infrastructure.