A Question Answering Dataset for Temporal-Sensitive Retrieval-Augmented Generation.
Ziyang Chen, Erxue Min, Xiang Zhao, Yunxin Li, Xin Jia, Jinzhi Liao, Jichao Li, Shuaiqiang Wang, Baotian Hu, Dawei Yin
Abstract
Open AccessWe introduce ChronoQA, a benchmark dataset for Chinese question answering focused on evaluating temporal reasoning in Retrieval-Augmented Generation (RAG) systems. Built from over 300,000 news articles published between 2019 and 2024, ChronoQA contains 5,176 questions covering absolute, aggregate, and relative temporal types, with both explicit and implicit time expressions. The dataset features both single- and multi-document scenarios, reflecting real-world requirements for temporal alignment and logical consistency. By providing structured evaluation across a wide range of temporal tasks, ChronoQA offers a dynamic, reliable, and scalable resource for benchmarking RAG systems in evolving knowledge environments.