High-dimensional continuous action space control via trust region optimized deep reinforcement learning.
Xia Wang
Abstract
Open AccessThe present research introduces the Adaptive Trust Region Policy Optimization for Action Space Compression (ATRPO-ACS) framework, a novel deep reinforcement learning approach optimized through trust region strategies, designed to address adaptive control challenges in high-dimensional continuous action spaces. By integrating distributed KL constraint optimization and manifold projection with residual compensation, the framework achieves significant improvements in sampling efficiency and real-time performance while reducing trajectory tracking errors and voltage limit violations. Experimental validations demonstrate its superior performance, with robotic arm tracking errors maintained within ± 0.08 mm and microgrid scheduling costs reduced by 28.5%. The framework also notably shortens production cycles in automotive welding lines. These advancements provide robust theoretical and technical support for real-time optimization control in industrial intelligent systems.