YOLO-AVCA-CBAMNet: Attention-driven framework for detection and classification of green pepper maturity stages.
Bipin Nair B J, Abrav Nanda K M, V Raghavendra
Abstract
Open AccessAccurate identification of pepper berry maturity is essential for ensuring optimal harvest timing and maintaining quality standards in spice production. This study proposes "YOLO-AVCA-CBAMNet" an integrated detection-and-classification framework designed to operate effectively under natural field conditions. A self-collected dataset of pepper berries, captured using a smartphone across diverse illumination settings and background complexities, forms the basis of the evaluation. The pipeline first applies YOLOv8 to detect individual berries within cluttered scenes. The extracted regions are then classified using convolutional neural networks enhanced with two complementary attention mechanisms. The Adaptive Visual Cortex Attention Module (AVCAM) strengthens global contextual weighting by adaptively recalibrating salient features, while the Convolutional Block Attention Module (CBAM) improves spatial and channel-specific discrimination through sequential attention refinement. This dual-attention design enables more reliable separation of visually similar maturity stages. Experimental results indicate accuracy gains of 5-9 % across all backbone architectures, with the DenseNet121-based configuration achieving a peak accuracy of 96.19 %. The findings demonstrate the potential of attention-driven models to support interpretable, efficient, and scalable maturity assessment solutions in precision agriculture.•Developed an end-to-end framework "YOLO-AVCA-CBAMNet" integrating object detection and attention-driven classification for pepper maturity assessment in natural field conditions.•Employed a field-derived image dataset of pepper berries collected under naturally varying illumination and environmental conditions, thereby supporting the ecological validity and practical relevance of the proposed maturity assessment approach.•Incorporated complementary attention mechanisms-AVCAM to enhance global contextual representation and CBAM to refine spatial and channel-specific feature responses-thereby improving discrimination among visually similar maturity stages.