Scientific reports

Reinforcement learning-driven feature selection enhanced by an evolutionary approach tuning for criminal suspect identification.

Zhenming Gao, Zhang Jian, Seyed Jalaleddin Mousavirad

Published: 202510.1038/s41598-025-25920-6

Abstract

Open Access

Accurate identification of criminal suspects is crucial for ensuring justice and deterring future crimes. Convolutional neural networks (CNNs) are frequently used to identify suspects. However, conventional methods that rely on CNNs often require assistance with feature selection (FS), class imbalance, and hyperparameter tuning, thereby diminishing their overall effectiveness. To overcome these obstacles, this study introduces a strategy based on reinforcement learning (RL), specifically off-policy proximal policy optimization (Off-policy PPO), which addresses FS and class imbalance. This approach is supplemented by a sophisticated differential evolution (DE) algorithm for tuning hyperparameters. We select Off-policy PPO because it reduces data needs, increases RL efficiency, and suits settings where data collection is costly. In our research, Off-policy PPO is dynamically tuned to improve FS and class balance. It consistently surpasses conventional static approaches by refining its approach to the intricate dynamics of criminal suspect detection. Furthermore, the DE algorithm is enhanced with a novel mutation strategy that employs k-means clustering to effectively identify key clusters. Our methodology is evaluated using four distinct datasets: the CelebFaces Attributes (CelebA), Labeled Faces in the Wild (LFW), Chinese Academy of Sciences Institute of Automation WebFace (CASIA-WebFace), and Visual Geometry Group Face 2 (VGGFace2) datasets. The experimental outcomes are remarkable, achieving F-measures of 89.409%, 91.152%, 92.184%, and 92.202%, respectively. These results demonstrate that the approach outperforms existing methods and advances early suspect detection, while also improving investigative strategies.

View at DOI