Predicting adolescent depressive symptoms using teacher-reported textual descriptions of abnormal behaviors: a study based on machine learning.
Nigela Wumaierjiang, Guoli Yan, Lidan Yuan, Jianan Song, Xiaofei Hou, Minghui Li, Ling Sun, Jiansong Zhou, Huifang Yin, Guangming Xu
Abstract
Open AccessObjective: This study aimed to develop and compare machine learning (ML) models for predicting depressive symptoms in adolescents, based on teacher-reported textual descriptions of student behaviors. Methods: Participants were 441 adolescents from Tianjin, China. Their teachers provided written reports on behavioral or emotional concerns, while the students completed the Patient Health Questionnaire-9 (PHQ-9). Text data from reports were processed using Term Frequency-Inverse Document Frequency (TF-IDF). Four ML models-Random Forest (RF), Support Vector Machine (SVM), eXtreme Gradient Boosting (XGBoost), and Least Absolute Shrinkage and Selection Operator (LASSO)-were trained and evaluated using a 80/20 data split and 5-fold cross-validation. Results: PHQ-9 screening identified 71.7% (n = 316) of adolescents with clinically significant depressive symptoms (score ≥10). The Random Forest (RF) model demonstrated superior performance, achieving a recall of 0.97, accuracy of 0.91, precision of 0.92, and F1-score of 0.92. SVM and XGBoost also showed good performance, while LASSO was the weakest. The analysis demonstrated that teacher reports could identify depressive symptoms with up to 97% recall. Conclusion: Machine learning, particularly Random Forest, can effectively predict adolescent depressive symptoms from teacher-reported text. This approach offers a practical and efficient tool for early identification in school settings, facilitating timely intervention.