Scientific reports

A Language model-based approach to sentiment classification of languages in central Asia.

Palidan Muhetaer, Guo Wenqiang, Lu Chong

Published: 202510.1038/s41598-025-22198-6

Abstract

Open Access

Xinjiang is located in the hinterland of the Asia-Europe continent, neighboring Kazakhstan and Kyrgyzstan, and is a bridgehead for China's eastward connection to the west, and the classification of Central Asian language emotions has become a development focus under the Belt and Road Initiative in Xinjiang. This paper takes the Kazakh versions of CHINA DAILY of Xinjiang News and SILK ROAD of Asia-Europe News as the source of the Central Asian language corpus and constructs an important corpus and online retrieval platform. Under the Transformer architecture, Word2Vec-TF-IDF and BERT models are used to train Central Asian regional language word vectors and construct word vector features, respectively. The word vector features obtained from the language pre-training model are used as inputs to obtain the local features of the Central Asian languages using multi-channel convolutional CNN, and the global features of the Central Asian languages are extracted by combining with the bi-directional GRU model. Then the fusion of local features and global features is carried out through the attention mechanism, and the SoftMax classifier outputs the classification results of sentiment tendency of Central Asian languages. The sentiment classification model designed in this paper achieves better classification results than other models on the Central Asian regional language corpus, and its accuracy can reach 92.78%, 91.45%, 93.54%, and the training time in classifying the sentiment of Central Asian regional languages using the model in this paper is 359.71 s. Using the language model as the basis of the Central Asian regional language sentiment classification can help Xinjiang Belt and Road Initiative implementation process to understand language sentiment changes.

View at DOI