School of Science, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China.
State Key Laboratory of Inorganic Synthesis and Preparative Chemistry, College of Chemistry, Jilin University, Changchun, 130012, People's Republic of China.
Sci Rep. 2023 Sep 21;13(1):15694. doi: 10.1038/s41598-023-42395-5.
Mass spectrometry technology can realize dynamic detection of many complex matrix samples in a simple, rapid, compassionate, precise, and high-throughput manner and has become an indispensable tool in accurate diagnosis. The mass spectrometry data analysis is mainly to analyze all metabolites in the organism quantitatively and to find the relative relationship between metabolites and physiological and pathological changes. A feature construction of mass spectrometry data (MSFS) method is proposed to construct the features of the original mass spectrometry data, so as to reduce the noise in the mass spectrometry data, reduce the redundancy of the original data and improve the information content of the data. Chi-square test is used to select the optimal non-redundant feature subset from high-dimensional features. And the optimal feature subset is visually analyzed and corresponds to the original mass spectrum interval. Training in 10 kinds of supervised learning models, and evaluating the classification effect of the models through various evaluation indexes. Taking two public mass spectrometry datasets as examples, the feasibility of the method proposed in this paper is verified. In the coronary heart disease dataset, during the identification process of mixed batch samples, the classification accuracy on the test set reached 1.000; During the recognition process, the classification accuracy on the test set advanced to 0.979. On the colorectal liver metastases data set, the classification accuracy on the test set reached 1.000. This paper attempts to use a new raw mass spectrometry data preprocessing method to realize the alignment operation of the raw mass spectrometry data, which significantly improves the classification accuracy and provides another new idea for mass spectrometry data analysis. Compared with MetaboAnalyst software and existing experimental results, the method proposed in this paper has obtained better classification results.
质谱技术可以以简单、快速、无创、准确和高通量的方式对许多复杂基质样本进行动态检测,已成为精准诊断不可或缺的工具。质谱数据分析主要是对生物体内所有代谢物进行定量分析,并找到代谢物与生理病理变化的相对关系。提出了一种质谱数据特征构建(MSFS)方法,用于构建原始质谱数据的特征,从而降低质谱数据中的噪声,减少原始数据的冗余度,提高数据的信息含量。使用卡方检验从高维特征中选择最佳的非冗余特征子集。并对最优特征子集进行可视化分析,对应原始质谱区间。在 10 种有监督学习模型中进行训练,并通过各种评价指标评价模型的分类效果。以两个公共质谱数据集为例,验证了本文提出方法的可行性。在冠心病数据集上,在混合批次样本的识别过程中,测试集的分类准确率达到 1.000;在识别过程中,测试集的分类准确率提高到 0.979。在结直肠癌肝转移数据集上,测试集的分类准确率达到 1.000。本文尝试使用新的原始质谱数据预处理方法实现原始质谱数据的对齐操作,显著提高了分类精度,为质谱数据分析提供了另一种新的思路。与 MetaboAnalyst 软件和现有实验结果相比,本文提出的方法获得了更好的分类结果。