Reinforcement learning for LLM-based explainable TCM prescription recommendation with implicit preferences from small language models. — SciRadar