Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Bundoora, Victoria 3086, Australia.
La Trobe Institute for Molecular Sciences, La Trobe University, Bundoora, Victoria 3086, Australia.
Anal Chem. 2022 Jun 7;94(22):7804-7813. doi: 10.1021/acs.analchem.1c05453. Epub 2022 May 26.
Feature extraction algorithms are an important class of unsupervised methods used to reduce data dimensionality. They have been applied extensively for time-of-flight secondary ion mass spectrometry (ToF-SIMS) imaging─commonly, matrix factorization (MF) techniques such as principal component analysis have been used. A limitation of MF is the assumption of linearity, which is generally not accurate for ToF-SIMS data. Recently, nonlinear autoencoders have been shown to outperform MF techniques for ToF-SIMS image feature extraction. However, another limitation of most feature extraction methods (including autoencoders) that is particularly important for hyperspectral data is that they do not consider spatial information. To address this limitation, we describe the application of the convolutional autoencoder (CNNAE) to hyperspectral ToF-SIMS imaging data. The CNNAE is an artificial neural network developed specifically for hyperspectral data that uses convolutional layers for image encoding, thereby explicitly incorporating pixel neighborhood information. We compared the performance of the CNNAE with other common feature extraction algorithms for two biological ToF-SIMS imaging data sets. We investigated the extracted features and used the dimensionality-reduced data to train additional ML algorithms. By converting two-dimensional convolutional layers to three-dimensional (3D), we also showed how the CNNAE can be extended to 3D ToF-SIMS images. In general, the CNNAE produced features with significantly higher contrast and autocorrelation than other techniques. Furthermore, histologically recognizable features in the data were more accurately represented. The extension of the CNNAE to 3D data also provided an important proof of principle for the analysis of more complex 3D data sets.
特征提取算法是一类重要的无监督方法,用于降低数据维度。它们已被广泛应用于飞行时间二次离子质谱成像(ToF-SIMS)——通常使用矩阵分解(MF)技术,如主成分分析。MF 的一个局限性在于其线性假设,这对于 ToF-SIMS 数据通常是不准确的。最近,非线性自动编码器已被证明在提取 ToF-SIMS 图像特征方面优于 MF 技术。然而,对于高光谱数据,大多数特征提取方法(包括自动编码器)的另一个限制是它们不考虑空间信息。为了解决这个限制,我们描述了卷积自动编码器(CNNAE)在高光谱 ToF-SIMS 成像数据中的应用。CNNAE 是一种专门为高光谱数据开发的人工神经网络,它使用卷积层对图像进行编码,从而明确地包含像素邻域信息。我们将 CNNAE 的性能与其他常见的特征提取算法进行了比较,用于两个生物 ToF-SIMS 成像数据集。我们研究了提取的特征,并使用降维数据训练了其他 ML 算法。通过将二维卷积层转换为三维(3D),我们还展示了如何将 CNNAE 扩展到 3D ToF-SIMS 图像。一般来说,CNNAE 生成的特征对比度和自相关明显高于其他技术。此外,数据中的组织学上可识别的特征也得到了更准确的表示。CNNAE 扩展到 3D 数据也为更复杂的 3D 数据集的分析提供了一个重要的原理证明。