After all, what is the proper use of Kappa statistics in oral health surveys? What don't manuals tell us?
Andréa Videira Assaf, Renato Pereira da Silva, Fábio Luiz Mialhe, Antonio Carlos Pereira
Abstract
Open AccessThe aim of this study was to elucidate, unclear points of the "Oral Health Survey: basic methods", of the World Health Organization (WHO), relative to reproducibility (encompassed reliability and agreement) issues during examiners' calibration. Thus, Kappa statistics and percent agreement were calculated for a sample of 10 12-year-old schoolchildren examined by 1 gold standard examiner and 5 dentists from Nova Friburgo, RJ, Brazil, in 2018, under the WHO and SB Brasil 2010 Project settings. Weighted Kappa was used to measure reliability between 2 examiners, and Fleiss' Kappa for 5 examiners. Tooth-to-tooth reliability was also assessed. The results showed that, although the choice of different settings invariably produced different reliability and agreement values, this approach was feasible, coherent and even desirable depending on the purpose of an epidemiological survey conducted. Kappa values were slightly lower in the SB Brasil 2010 Project setting. The results for tooth-to-tooth reliability, in turn, allowed identification of teeth (in this sample, teeth 17, 23, 27, 34, 37, 44, 45, and 47) for which additional examiner calibrations would be necessary. It is concluded that providing additional information for inclusion in the WHO manual, such as the possibility of varying the setting, adopting the tooth-by-tooth unit, and selecting the correct type of Kappa statistic depending on the number of examiners, within a multilevel calibration proposal, may result in more reliable results during the calibration stage.