使用质谱数据进行卵巢癌分类的统计方法比较

Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data.

作者信息

Wu Baolin, Abbott Tom, Fishman David, McMurray Walter, Mor Gil, Stone Kathryn, Ward David, Williams Kenneth, Zhao Hongyu

机构信息

Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, USA.

出版信息

Bioinformatics. 2003 Sep 1;19(13):1636-43. doi: 10.1093/bioinformatics/btg210.

DOI:10.1093/bioinformatics/btg210

PMID:12967959

Abstract

MOTIVATION

Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancer patients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancer patients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data.

RESULTS

We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.

摘要

动机

迫切需要新的分子和统计方法，以利用生物技术和人类基因组计划的最新进展进行疾病诊断和预后评估。质谱（MS）在生物标志物识别和全基因组蛋白质谱分析方面具有巨大潜力。文献表明，利用MS数据可以识别出区分正常个体与癌症患者的生物标志物。这一进展对于早期卵巢癌患者的检测尤为令人兴奋。尽管已经使用了各种统计方法从MS数据中识别生物标志物，但这些方法在分析MS数据的相对能力方面尚未进行系统比较。

结果

我们比较了几类基于MS光谱进行癌症分类的统计方法的性能。这些方法包括：线性判别分析、二次判别分析、k近邻分类器、装袋和提升分类树、支持向量机以及随机森林（RF）。这些方法应用于西北大学医院国家卵巢癌早期检测计划诊所的卵巢癌和对照血清样本。我们发现，在MS数据分析中，RF的表现优于其他方法。

相似文献

Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data.使用质谱数据进行卵巢癌分类的统计方法比较

Bioinformatics. 2003 Sep 1;19(13):1636-43. doi: 10.1093/bioinformatics/btg210.

Sample classification from protein mass spectrometry, by 'peak probability contrasts'.通过“峰概率对比”对蛋白质质谱样本进行分类。

Bioinformatics. 2004 Nov 22;20(17):3034-44. doi: 10.1093/bioinformatics/bth357. Epub 2004 Jun 29.

Proteomic biomarker identification for diagnosis of early relapse in ovarian cancer.用于卵巢癌早期复发诊断的蛋白质组学生物标志物鉴定

J Bioinform Comput Biol. 2006 Dec;4(6):1159-79. doi: 10.1142/s0219720006002399.

Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data.基于高通量质谱数据降维的卵巢癌识别

Bioinformatics. 2005 May 15;21(10):2200-9. doi: 10.1093/bioinformatics/bti370. Epub 2005 Mar 22.

Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve.通过正则化ROC曲线下面积对糖组学质谱数据进行分析。

BMC Bioinformatics. 2007 Dec 12;8:477. doi: 10.1186/1471-2105-8-477.

High-resolution serum proteomic features for ovarian cancer detection.用于卵巢癌检测的高分辨率血清蛋白质组学特征

Endocr Relat Cancer. 2004 Jun;11(2):163-78. doi: 10.1677/erc.0.0110163.

[Identification of serum biomarkers for ovarian cancer using protein chips and time of flight mass spectrometry technology].[利用蛋白质芯片和飞行时间质谱技术鉴定卵巢癌血清生物标志物]

Zhonghua Fu Chan Ke Za Zhi. 2006 Aug;41(8):544-8.

Plasma proteomic pattern as biomarkers for ovarian cancer.血浆蛋白质组学模式作为卵巢癌的生物标志物

Int J Gynecol Cancer. 2006 Jan-Feb;16 Suppl 1:139-46. doi: 10.1111/j.1525-1438.2006.00475.x.

Computer assisted optical screening of human ovarian cancer using Raman spectroscopy.利用拉曼光谱对人类卵巢癌进行计算机辅助光学筛查。

Photodiagnosis Photodyn Ther. 2016 Sep;15:94-9. doi: 10.1016/j.pdpdt.2016.05.011. Epub 2016 May 26.

引用本文的文献

MSMCE: A novel representation module for classification of raw mass spectrometry data.MSMCE：一种用于原始质谱数据分类的新型表示模块。

PLoS One. 2025 Aug 6;20(8):e0321239. doi: 10.1371/journal.pone.0321239. eCollection 2025.

Added value of inflammatory plasma biomarkers to pathologic biomarkers in predicting preclinical Alzheimer's disease.炎性血浆生物标志物对预测临床前阿尔茨海默病的病理生物标志物的附加值。

J Alzheimers Dis. 2024 Nov;102(1):89-98. doi: 10.1177/13872877241283692. Epub 2024 Oct 3.

Follicular Fluid Proteomic Analysis to Identify Predictive Markers of Normal Embryonic Development.卵泡液蛋白质组学分析鉴定正常胚胎发育的预测标志物。

Int J Mol Sci. 2024 Aug 1;25(15):8431. doi: 10.3390/ijms25158431.

Differential Serum Peptidomics Reveal Multi-Marker Models That Predict Breast Cancer Progression.差异血清肽组学揭示预测乳腺癌进展的多标志物模型。

Cancers (Basel). 2024 Jun 27;16(13):2365. doi: 10.3390/cancers16132365.

MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data.MSFC：一种用于准确诊断质谱数据的新特征构造方法。

Sci Rep. 2023 Sep 21;13(1):15694. doi: 10.1038/s41598-023-42395-5.

Early Diagnosis: End-to-End CNN-LSTM Models for Mass Spectrometry Data Classification.早期诊断：用于质谱数据分析分类的端到端 CNN-LSTM 模型。

Anal Chem. 2023 Sep 12;95(36):13431-13437. doi: 10.1021/acs.analchem.3c00613. Epub 2023 Aug 25.

Characterization of novel loci controlling seed oil content in Brassica napus by marker metabolite-based multi-omics analysis.利用基于标记代谢物的多组学分析鉴定控制油菜种子含油量的新基因座。

Genome Biol. 2023 Jun 19;24(1):141. doi: 10.1186/s13059-023-02984-z.

The impact of stress and anesthesia on animal models of infectious disease.压力和麻醉对传染病动物模型的影响。

Front Vet Sci. 2023 Feb 2;10:1086003. doi: 10.3389/fvets.2023.1086003. eCollection 2023.

Virtual screening of Indonesian herbal compounds as COVID-19 supportive therapy: machine learning and pharmacophore modeling approaches.印度尼西亚草药化合物作为 COVID-19 支持性治疗的虚拟筛选：机器学习和药效团模型方法。

BMC Complement Med Ther. 2022 Aug 3;22(1):207. doi: 10.1186/s12906-022-03686-y.

On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples.关于人类血液样本蛋白质组分析的综合质谱数据分析

J Healthc Inform Res. 2018 May 22;2(3):305-318. doi: 10.1007/s41666-018-0022-0. eCollection 2018 Sep.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用质谱数据进行卵巢癌分类的统计方法比较

Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献