Food chemistry: X

Integrated Metabolomics-KPCA-Machine Learning framework: a solution for geographical traceability of Chinese Jujube.

Xiaoli Wang, Xiaolei Ma, Yuxin Liu, Wenhan Tao, Yuting Zuo, Yueqin Zhu, Feng Hua, Chanming Liu, Wei Huang

Published: 202510.1016/j.fochx.2025.103069

Abstract

Open Access

Due to widespread product adulteration, Chinese jujube (CJ), a crop of global economic importance with nutritional and medicinal properties, struggles with geographical traceability. The study introduced a Metabolomics-Kernel Principal Component Analysis (KPCA)-Machine Learning (ML) framework to set up an origin identification system for CJ from six production regions in China (Xinjiang, Gansu, Shaanxi, Henan, Shandong, and Hebei). Using LC-MS/MS for untargeted metabolomics, researchers identified 312 metabolites. Multivariate analysis revealed 37 key discriminant variables (VIP > 1). KPCA compressed these features into 28 principal components (retaining 90.59 % information). Compared with the traditional method, the K-means clustering after dimensionality reduction of KPCA greatly improves the sample differentiation ability: the origin samples with original data overlap with fuzzy boundaries; while after dimensionality reduction, the six origin samples form a clear and compact cluster, which achieves accurate classification. This study pioneers a "Metabolomics-KPCA-ML" paradigm, offering a solution for traceability of geographical indication agricultural products.

View at DOI