Suppr超能文献

使用质谱数据进行卵巢癌分类的统计方法比较

Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data.

作者信息

Wu Baolin, Abbott Tom, Fishman David, McMurray Walter, Mor Gil, Stone Kathryn, Ward David, Williams Kenneth, Zhao Hongyu

机构信息

Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, USA.

出版信息

Bioinformatics. 2003 Sep 1;19(13):1636-43. doi: 10.1093/bioinformatics/btg210.

Abstract

MOTIVATION

Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancer patients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancer patients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data.

RESULTS

We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.

摘要

动机

迫切需要新的分子和统计方法,以利用生物技术和人类基因组计划的最新进展进行疾病诊断和预后评估。质谱(MS)在生物标志物识别和全基因组蛋白质谱分析方面具有巨大潜力。文献表明,利用MS数据可以识别出区分正常个体与癌症患者的生物标志物。这一进展对于早期卵巢癌患者的检测尤为令人兴奋。尽管已经使用了各种统计方法从MS数据中识别生物标志物,但这些方法在分析MS数据的相对能力方面尚未进行系统比较。

结果

我们比较了几类基于MS光谱进行癌症分类的统计方法的性能。这些方法包括:线性判别分析、二次判别分析、k近邻分类器、装袋和提升分类树、支持向量机以及随机森林(RF)。这些方法应用于西北大学医院国家卵巢癌早期检测计划诊所的卵巢癌和对照血清样本。我们发现,在MS数据分析中,RF的表现优于其他方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验