Studies in health technology and informaticsArtificial IntelligenceReproducibility of ResultsHumansObserver Variation
Evaluating AI-Powered Q&A Systems: A Simple Approach to Determining the Need for Expert Ratings.
Dorian Zwanzig, Luca Kreibich, Uta Binder, Ute Dietrich
Published: 202510.3233/SHTI251532
Abstract
This paper introduces a simple approach for assessing whether laypeople or AI-based automations can adequately substitute for expert ratings in the evaluation of AI-powered Q&A systems It employs weighted Cohen's Kappa to assess inter-rater reliabili…
Preview only. Read the full abstract at the source