Comparison of Mask-R-CNN and Thresholding-Based Segmentation for High-Throughput Phenotyping of Walnut Kernel Color.
Steven H Lee, Sean McDowell, Charles Leslie, Kristina McCreery, Mason Earles, Patrick J Brown
Abstract
Open AccessHigh-throughput phenotyping has become essential for plant breeding programs, replacing traditional methods that rely on subjective scales influenced by human judgment. Machine learning (ML) computer vision systems have successfully used convolutional neural networks (CNNs) for image segmentation, providing greater flexibility than thresholding methods that may require carefully staged images. This study compares two quantitative image analysis methods, rule-based thresholding using the magick package in R and an instance-segmentation pipeline based on the widely used Mask-R-CNN architecture, and then compares the output of each to two different sets of human evaluations. Walnuts were collected over three years from over 3000 individual trees maintained by the UC Davis walnut breeding program. The resulting 90,961 kernels were placed into 100-cell trays and imaged using a 20-megapixel Basler camera with a Sony IMX183 sensor. Quantitative data from both image analysis methods were highly correlated for both lightness (L*; r2 = 0.997) and size (r2 = 0.984). The thresholding method required many manual adjustments to account for minor discrepancies in staging, while the CNN method was robust after a rapid initial training on only 13 images. The two human scoring methods were not highly correlated with the image analysis methods or with each other. Pixel classification provides data similar to human color assessments but offers greater consistency across different years. The thresholding approach offers flexibility and has been applied to other color-based phenotyping tasks, while the CNN approach can be adapted to images that are not perfectly staged and be retrained to quantify more subtle kernel characteristics such as spotting and shrivel.