Busov I D, Genaev M A, Komyshev E G, Koval V S, Zykova T E, Glagoleva A Y, Afonnikov D A
Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia.
Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.
Vavilovskii Zhurnal Genet Selektsii. 2024 Jul;28(4):443-455. doi: 10.18699/vjgb-24-50.
Analysis of hyperspectral images is of great interest in plant studies. Nowadays, this analysis is used more and more widely, so the development of hyperspectral image processing methods is an urgent task. This paper presents a hyperspectral image processing pipeline that includes: preprocessing, basic statistical analysis, visualization of a multichannel hyperspectral image, and solving classification and clustering problems using machine learning methods. The current version of the package implements the following methods: construction of a confidence interval of an arbitrary level for the difference of sample averages; verification of the similarity of intensity distributions of spectral lines for two sets of hyperspectral images on the basis of the Mann-Whitney U-criterion and Pearson's criterion of agreement; visualization in two-dimensional space using dimensionality reduction methods PCA, ISOMAP and UMAP; classification using linear or ridge regression, random forest and catboost; clustering of samples using the EM-algorithm. The software pipeline is implemented in Python using the Pandas, NumPy, OpenCV, SciPy, Sklearn, Umap, CatBoost and Plotly libraries. The source code is available at: https://github.com/igor2704/Hyperspectral_images. The pipeline was applied to identify melanin pigment in the shell of barley grains based on hyperspectral data. Visualization based on PCA, UMAP and ISOMAP methods, as well as the use of clustering algorithms, showed that a linear separation of grain samples with and without pigmentation could be performed with high accuracy based on hyperspectral data. The analysis revealed statistically significant differences in the distribution of median intensities for samples of images of grains with and without pigmentation. Thus, it was demonstrated that hyperspectral images can be used to determine the presence or absence of melanin in barley grains with great accuracy. The flexible and convenient tool created in this work will significantly increase the efficiency of hyperspectral image analysis.
高光谱图像分析在植物研究中备受关注。如今,这种分析的应用越来越广泛,因此开发高光谱图像处理方法成为一项紧迫任务。本文提出了一种高光谱图像处理流程,包括:预处理、基本统计分析、多通道高光谱图像可视化,以及使用机器学习方法解决分类和聚类问题。该软件包的当前版本实现了以下方法:构建样本均值差异的任意水平置信区间;基于曼-惠特尼U检验和皮尔逊一致性准则,验证两组高光谱图像光谱线强度分布的相似性;使用主成分分析(PCA)、等距映射(ISOMAP)和均匀流形近似投影(UMAP)等降维方法在二维空间中进行可视化;使用线性或岭回归、随机森林和CatBoost进行分类;使用期望最大化(EM)算法对样本进行聚类。该软件流程使用Pandas、NumPy、OpenCV、SciPy、Sklearn、Umap、CatBoost和Plotly库在Python中实现。源代码可在以下网址获取:https://github.com/igor2704/Hyperspectral_images 。该流程被应用于基于高光谱数据识别大麦籽粒外壳中的黑色素色素。基于PCA、UMAP和ISOMAP方法的可视化以及聚类算法的使用表明,基于高光谱数据可以高精度地对有色素和无色素的籽粒样本进行线性分离。分析揭示了有色素和无色素籽粒图像样本的中值强度分布在统计上存在显著差异。因此,证明了高光谱图像可用于非常准确地确定大麦籽粒中黑色素的存在与否。这项工作中创建的灵活便捷工具将显著提高高光谱图像分析的效率。