A Multimodal hyperspectral dataset of cocoa beans with physicochemical annotation.
Kebin Contreras, Mohamad Jouni, Mauro Dalla Mura, Jorge Bacca
Abstract
Open AccessAssessing cocoa bean quality using spectral information offers a noninvasive and objective alternative to traditional, often subjective and destructive, methods. However, progress has been limited by the lack of comprehensive datasets across multiple spectral resolutions. This work presents a new dataset capturing the spectral properties of cocoa beans at different spatiospectral resolutions, enabling non-invasive quality assessment and scalable evaluation methodologies. It comprises 19 scenes acquired with four imaging devices under both open (invasive) and closed (non-invasive) conditions, along with corresponding physicochemical measurements. Data collection follows the Colombian standard NTC 1252:2021, which labels beans as well, partially, or poorly fermented. Global physicochemical properties-moisture, polyphenols, and cadmium-were measured using gravimetric analysis, UV-visible spectroscopy, and atomic absorption spectroscopy with microwave digestion. Hyperspectral images were obtained using four devices covering up to the 350-1000 nm spectral range. Statistical analysis shows the dataset distinguishes between cocoa quality levels under both open and closed conditions, supporting the development of automated classification methods.