Influences and Training Strategies for Effective Object Detection in Challenging Environments Using YOLO NAS-L.
Gerald Steindl, Arnold Baca, Philipp Kornfeind
Abstract
Open AccessYOLO (You Only Look Once) is a one-stage detector that predicts object classes and bounding boxes in a single pass without an explicit region proposal step. In contrast, two-stage detectors first generate candidate regions. The YOLO NAS-L model is specifically designed to improve the detection of small objects. The purpose of this study is to systematically investigate the influence of dataset characteristics, training strategies and hyperparameter selection on the performance of YOLO NAS-L in a challenging object detection scenario: detecting swimmers in aquatic environments. Using both the mean Average Precision value (mAP)-which reflects the model's global precision-recall performance and the F1-score, indicating the model's effectiveness under realistic operating conditions-as evaluation metrics, this study investigates the effects of batch size, batch accumulation, number of training epochs, image resolution, pre-trained weights, and data augmentation. Our findings indicate that while batch size and image resolution had limited impact on performance parameters, the use of batch accumulation, pre-trained weights and careful tuning of training epochs were critical for optimizing model performance. The results highlight the practical significance of combining optimized hyperparameters, training strategies, and pre-trained weights to efficiently develop high-performing YOLO NAS-L models.