Scientific reports

Recommendation of deep reinforcement learning based on value function considering error reduction.

JinLian Zhou, DeRong Shen, Ying Guo, Yan Wu, JianHua Ma

Published: 202510.1038/s41598-025-18926-7

Abstract

Open Access

Deep reinforcement learning (DRL) algorithms have been widely applied in user cold-start recommender systems because they can gradually capture users' dynamic interest preferences. Deep Q-Networks (DQN) have become the most popular reinforcement learning (RL) method due to their simple update strategy and excellent performance. In many user cold-start scenarios, the action space is gradually reduced to avoid recommending duplicate items to users. However, current DQN-based RL recommender systems output the entire action space fixedly, inevitably leading to discrepancies with the gradually shrinking action space. This paper demonstrates that such discrepancies cause a decrement error in the action space corresponding to the temporal difference (TD) in the original RL, rendering standard DQN reinforcement learning methods inaccurate in Q-value estimation. Moreover, in long-term recommendation scenarios, the differences in the lengths of interactions recommended to different users are significant, making it difficult to ignore such errors, thereby challenging the applicability of these methods in scenarios where the action space gradually reduces. To address this issue, this paper introduces a new algorithm called Q-AD (Q-learning Action Decrease), which is based on DQN and aims to mitigate the reduction error in the action space by buffering the Q-value estimation error at each update. Q-AD augments the standard DQN with an error reduction term for TD updates. Through experiments, it was observed that the Q-AD algorithm significantly reduces value estimation errors and achieves better accuracy and efficiency compared to previous methods across different datasets.

View at DOI