Scientific reports

Optimizing thermoelectric energy harvesting using deep reinforcement learning for dynamic energy management and system efficiency.

Chirayu Nilesh Chaudhari, N J Rtamanyu, Naga Sai Shreya Kunda, K C Pranave, Mayank Pandey, S Sruthi, Manju Khanna

Published: 202510.1038/s41598-025-27210-7

Abstract

Open Access

By addressing the drawbacks of static optimization techniques, this research seeks to improve the dynamic energy management of thermoelectric generators (TEGs). Finding the best deep reinforcement learning (DRL) algorithm to maximize energy distribution, prolong battery life, and boost system efficiency in the face of the variable conditions present in waste heat recovery systems is the goal. The TEG system was modeled using the Markov decision process and implemented in a computer-simulated environment. Three advanced DRL algorithms were used: soft actor-critic (SAC), proximal policy optimization (PPO), and deep Q-networks (DQN); which were trained to act as intelligent controllers. The performance of each algorithm was systematically evaluated and compared using key metrics, including average cumulative reward, battery health, system efficiency, and a novel metric termed the energy fulfillment rate, which measures the ability to meet demand while storing surplus energy. The comparative analysis revealed a critical trade-off between maximizing performance and ensuring hardware longevity. The SAC algorithm demonstrated the best overall performance, achieving the highest average reward (- 7.03) and energy fulfillment rate (22.84%). However, the A2C, DDPG, and PPO algorithms all achieved a perfect average battery health of 100.00%, highlighting their superior capability for preserving system longevity, albeit with slightly lower rewards. The DQN algorithm consistently showed the least effective performance across all metrics, particularly in maintaining battery health (60.73%). The SAC algorithm is the most suitable of the methods tested for dynamically managing TEG systems. Its underlying principle of maximization of entropy enables a better exploration of control strategies, leading to a better balance between immediate energy dispatch and long-term storage goals. The findings confirm the significant potential of DRL to create efficient and adaptive controllers for renewable energy applications, although further validation of physical hardware is required to confirm real-world viability.