Scientific reportsAlgorithmsHumansCyberbullyingLanguageSocial Media

An optimized Arabic cyberbullying detection approach based on genetic algorithms.

Aya M Eissa, Shawkat K Guirguis, Magda M Madbouly

Published: 202510.1038/s41598-025-23586-8

Abstract

Open Access

The rise of cyberbullying in digital communication platforms has triggered widespread concern, not just for its reach but for the lasting psychological harm caused. Identifying such harmful behavior online is difficult in general, but when the target language is Arabic, the task becomes more complicated. The issue is not just that Arabic is written in multiple dialects, each with its own informal vocabulary, spelling variations, and structure. What complicates matters further is that meaning often shifts based on region, tone, and the social context, making abusive content harder to catch using conventional tools. This study aims to improve Arabic cyberbullying detection mechanisms by introducing a feature-selection strategy. Its main contribution involves utilizing a Genetic Algorithm (GA)-based feature selector to pinpoint harmful language patterns in a corpus of 46k Arabic Instagram comments. The GA effectively reduced the feature space by approximately half, preserving essential semantic structures while removing noise and redundancy. Four classifiers were evaluated, and GA-driven selection improved F1-scores by (3.45-14.96%) and reduced classification time by a factor of 2.32-12. These findings suggest that genetic-feature optimization enhances model precision while significantly improving runtime and reducing complexity, thereby enabling scalable, context-sensitive cyberbullying detection for Arabic and morphologically rich languages.

View at DOI