STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, Department of Electrical Engineering (ESAT), KU Leuven, Leuven, Belgium.
Department of Cellular and Molecular Medicine, KU Leuven Campus Gasthuisberg O&N 2, Leuven, Belgium.
Rapid Commun Mass Spectrom. 2021 Nov 15;35(21):e9181. doi: 10.1002/rcm.9181.
Non-negative matrix factorization (NMF) has been used extensively for the analysis of mass spectrometry imaging (MSI) data, visualizing simultaneously the spatial and spectral distributions present in a slice of tissue. The statistical framework offers two related NMF methods: probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), which is a generative model. This work offers a mathematical comparison between NMF, PLSA, and LDA, and includes a detailed evaluation of Kullback-Leibler NMF (KL-NMF) for MSI for the first time. We will inspect the results for MSI data analysis as these different mathematical approaches impose different characteristics on the data and the resulting decomposition.
The four methods (NMF, KL-NMF, PLSA, and LDA) are compared on seven different samples: three originated from mice pancreas and four from human-lymph-node tissues, all obtained using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS).
Where matrix factorization methods are often used for the analysis of MSI data, we find that each method has different implications on the exactness and interpretability of the results. We have discovered promising results using KL-NMF, which has only rarely been used for MSI so far, improving both NMF and PLSA, and have shown that the hitherto stated equivalent KL-NMF and PLSA algorithms do differ in the case of MSI data analysis. LDA, assumed to be the better method in the field of text mining, is shown to be outperformed by PLSA in the setting of MALDI-MSI. Additionally, the molecular results of the human-lymph-node data have been thoroughly analyzed for better assessment of the methods under investigation.
We present an in-depth comparison of multiple NMF-related factorization methods for MSI. We aim to provide fellow researchers in the field of MSI a clear understanding of the mathematical implications using each of these analytical techniques, which might affect the exactness and interpretation of the results.
非负矩阵分解(NMF)已被广泛用于质谱成像(MSI)数据分析,同时可视化组织切片中的空间和光谱分布。该统计框架提供了两种相关的 NMF 方法:概率潜在语义分析(PLSA)和潜在狄利克雷分配(LDA),这是一种生成模型。这项工作对 NMF、PLSA 和 LDA 进行了数学比较,并首次详细评估了用于 MSI 的柯尔莫哥洛夫-莱布勒 NMF(KL-NMF)。我们将检查这些不同数学方法对 MSI 数据分析的结果,因为这些方法对数据和由此产生的分解施加了不同的特征。
在七种不同的样本上比较了四种方法(NMF、KL-NMF、PLSA 和 LDA):三种来自小鼠胰腺,四种来自人类淋巴结组织,均使用基质辅助激光解吸/电离飞行时间质谱(MALDI-TOF MS)获得。
在矩阵分解方法常用于 MSI 数据分析的情况下,我们发现每种方法对结果的准确性和可解释性都有不同的影响。我们使用 KL-NMF 发现了有希望的结果,迄今为止,KL-NMF 很少用于 MSI,它提高了 NMF 和 PLSA 的性能,并表明在 MSI 数据分析中,迄今为止所陈述的等效 KL-NMF 和 PLSA 算法确实不同。在 MALDI-MSI 中,在文本挖掘领域被认为是更好的方法的 LDA 被证明不如 PLSA。此外,对人类淋巴结数据的分子结果进行了彻底分析,以便更好地评估所研究的方法。
我们对多种与 NMF 相关的 MSI 因子分解方法进行了深入比较。我们旨在为 MSI 领域的研究人员提供对使用这些分析技术的数学含义的清晰理解,这可能会影响结果的准确性和解释。