Frontiers in public healthHumansResource AllocationAlgorithmsEpidemicsMarkov Chains

Online bipartite matching methodology for anti-epidemic resources allocation: an adaptive time window based on reinforcement learning.

Zhiyong Wu, Sulin Pang, Suyan He

Published: 202510.3389/fpubh.2025.1644499

Abstract

Open Access

Background: This study aimed to investigate the online matching problem for anti-epidemic resources among multiple suppliers and recipients in the Internet of Healthcare System during a major outbreak. It accounts for the heterogeneity of supply and demand. Methods: A multi-stage online dynamic bipartite matching model based on time windows is developed, which can be reformulated as a Markov decision process. An adaptive time window batch bipartite matching algorithm based on reinforcement learning is proposed, which utilizes the nearest neighbor's first heuristic strategy to allocate anti-epidemic resources. Results: The optimal window size in fixed time window batch matching strategy (FTWBM) outperforms the results of adaptive time window batch matching strategy (ATWBM). However, the ATWBM strategy demonstrates greater effectiveness in adapting to the dynamic changes in epidemic prevention and control, particularly in partially optimistic scenarios. Conclusions: The results revealed that, although the average matching rate consistently increases, the average waiting time initially decreases before rising again as the matching time window expands. This finding implies that health operations managers should modify the matching time window in response to changing epidemic dynamics and resource availability.

View at DOI