Masked image modeling in medical hyperspectral imaging: reconstruction evaluation and downstream tasks.
Kelden Pruitt, Hemanth Pasupuleti, James Yu, Weston DeAtley, Baowei Fei
Abstract
Open AccessSelf-supervised pre-training has been shown to improve deep learning networks in various tasks including natural language processing and computer vision. While this approach has shown promise in various fields, more development and translation need to be dedicated to medical imaging applications. Current literature scarcely focuses on thorough assessment of implemented pre-training approaches as well, potentially hindering performance in downstream tasks. In this work, we leverage a state-of-the-art pre-training architecture with hyperspectral imaging (HSI) to effectively encode spatial and spectral features of various ex vivo tissues. We utilize a masked image modeling scheme to perform pre-training on an internal dataset captured with a high-speed hyperspectral laparoscopic imaging system. Our network implements sequential spectral and spatial attention, factorizing the model for efficiency. Evaluation of both pre-training and finetuned classification was performed on a validation dataset unseen in either set to prevent data leakage. Pre-training results are qualitatively assessed through reconstruction visualization and quantitatively assessed with mean absolute error (MAE), achieving a value of 0.0294 on the validation dataset. To test the capabilities of the pre-trained model, we finetuned the network as an abdominal tissue classifier, achieving 87.9% accuracy on 17 classes with frozen model weights. Overall, we present a masked autoencoding framework for the pre-training of hyperspectral images with an emphasis on the evaluation of the network for potential improvements in downstream tasks such as tissue classification and segmentation.