Studies in health technology and informaticsArtificial IntelligenceReproducibility of ResultsHumansObserver Variation

Evaluating AI-Powered Q&A Systems: A Simple Approach to Determining the Need for Expert Ratings.

Dorian Zwanzig, Luca Kreibich, Uta Binder, Ute Dietrich

Published: 202510.3233/SHTI251532

Abstract

This paper introduces a simple approach for assessing whether laypeople or AI-based automations can adequately substitute for expert ratings in the evaluation of AI-powered Q&A systems It employs weighted Cohen's Kappa to assess inter-rater reliabili…

Preview only. Read the full abstract at the source

View at DOI