Institute for Algebra, Geometry, Topology and their Applications (ALTA), University of Bremen, 28359, Bremen, Germany.
Institute for Statistics, University of Bremen, 28359, Bremen, Germany.
BMC Bioinformatics. 2023 Jul 10;24(1):279. doi: 10.1186/s12859-023-05402-0.
Matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI MSI) displays significant potential for applications in cancer research, especially in tumor typing and subtyping. Lung cancer is the primary cause of tumor-related deaths, where the most lethal entities are adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). Distinguishing between these two common subtypes is crucial for therapy decisions and successful patient management.
We propose a new algebraic topological framework, which obtains intrinsic information from MALDI data and transforms it to reflect topological persistence. Our framework offers two main advantages. Firstly, topological persistence aids in distinguishing the signal from noise. Secondly, it compresses the MALDI data, saving storage space and optimizes computational time for subsequent classification tasks. We present an algorithm that efficiently implements our topological framework, relying on a single tuning parameter. Afterwards, logistic regression and random forest classifiers are employed on the extracted persistence features, thereby accomplishing an automated tumor (sub-)typing process. To demonstrate the competitiveness of our proposed framework, we conduct experiments on a real-world MALDI dataset using cross-validation. Furthermore, we showcase the effectiveness of the single denoising parameter by evaluating its performance on synthetic MALDI images with varying levels of noise.
Our empirical experiments demonstrate that the proposed algebraic topological framework successfully captures and leverages the intrinsic spectral information from MALDI data, leading to competitive results in classifying lung cancer subtypes. Moreover, the framework's ability to be fine-tuned for denoising highlights its versatility and potential for enhancing data analysis in MALDI applications.
基质辅助激光解吸电离质谱成像(MALDI MSI)在癌症研究中具有很大的应用潜力,尤其是在肿瘤分型和亚型分类方面。肺癌是肿瘤相关死亡的主要原因,其中最致命的实体是腺癌(ADC)和鳞状细胞癌(SqCC)。区分这两种常见亚型对于治疗决策和成功的患者管理至关重要。
我们提出了一种新的代数拓扑框架,该框架从 MALDI 数据中获取内在信息,并将其转换为反映拓扑持久性的信息。我们的框架有两个主要优点。首先,拓扑持久性有助于区分信号和噪声。其次,它压缩了 MALDI 数据,节省了存储空间,并优化了后续分类任务的计算时间。我们提出了一种算法,该算法基于单个调整参数有效地实现了我们的拓扑框架。然后,对数回归和随机森林分类器用于提取的持久性特征上,从而实现了自动肿瘤(亚型)分类过程。为了证明我们提出的框架的竞争力,我们使用交叉验证在真实的 MALDI 数据集上进行了实验。此外,我们通过评估具有不同噪声水平的合成 MALDI 图像来展示单个去噪参数的有效性。
我们的实验结果表明,所提出的代数拓扑框架成功地捕获和利用了 MALDI 数据中的内在光谱信息,从而在分类肺癌亚型方面取得了有竞争力的结果。此外,该框架用于去噪的可调参数的能力突出了其多功能性和在 MALDI 应用中增强数据分析的潜力。