Chinese medicine

Reinforcement learning for LLM-based explainable TCM prescription recommendation with implicit preferences from small language models.

Xinyu Wang, Xiaohe Sun, Lei Yang, Yitong Zhang, Tao Yang, Jiadong Xie, Kongfa Hu

Published: 202510.1186/s13020-025-01250-7

Abstract

Open Access

OBJECTIVE: In an effort to reinforce both the interpretability and accuracy of prescription recommendations in Traditional Chinese Medicine (TCM), this study puts forward a two-stage training framework that integrates knowledge distillation from a teacher model with implicit preference-driven reinforcement learning grounded in a compact model. METHODS: Above all, GPT-4o is employed to parse structured TCM clinical records, creating high-quality distillation samples. These are employed to guide Low-Rank Adaptation (LoRA)-based fine-tuning of the Qwen2.5-7B model, enabling it to generate explainable outputs in the format of "symptom analysis-prescription recommendation-prescription explanation". Then, a lightweight BART (Bidirectional and Auto-Regressive Transformers) model is trained to learn the mapping relation between symptoms and prescriptions. Its outputs are compared with those of the large model to construct preference pairs, which are subsequently utilized in Direct Preference Optimization (DPO)-based reinforcement tuning to further align the model with potentially better recommendations. RESULTS: The suggested model achieves a P@30 of 35.62% and F1@30 of 37.36%, outperforming existing baselines. Knowledge distillation contributes to the improvement of the model's generalization and explainability, while implicit preference-based reinforcement further enhances F1@30 by 2.01%. Overall, the model obtains more desirable performance in both accuracy and explainability. CONCLUSION: The recommended approach not only improves the quality and transparency of TCM prescription recommendations, but also offers a fruitful strategy for building trustworthy and clinically applicable intelligent TCM decision-support systems.

View at DOI