Suppr超能文献

基于 SELDI-TOF 数据降维的卵巢癌分类。

Ovarian cancer classification based on dimensionality reduction for SELDI-TOF data.

机构信息

Department of Chemistry, Tongji University, Shanghai, 200092, China.

出版信息

BMC Bioinformatics. 2010 Feb 27;11:109. doi: 10.1186/1471-2105-11-109.

Abstract

BACKGROUND

Recent advances in proteomics technologies such as SELDI-TOF mass spectrometry has shown promise in the detection of early stage cancers. However, dimensionality reduction and classification are considerable challenges in statistical machine learning. We therefore propose a novel approach for dimensionality reduction and tested it using published high-resolution SELDI-TOF data for ovarian cancer.

RESULTS

We propose a method based on statistical moments to reduce feature dimensions. After refining and t-testing, SELDI-TOF data are divided into several intervals. Four statistical moments (mean, variance, skewness and kurtosis) are calculated for each interval and are used as representative variables. The high dimensionality of the data can thus be rapidly reduced. To improve efficiency and classification performance, the data are further used in kernel PLS models. The method achieved average sensitivity of 0.9950, specificity of 0.9916, accuracy of 0.9935 and a correlation coefficient of 0.9869 for 100 five-fold cross validations. Furthermore, only one control was misclassified in leave-one-out cross validation.

CONCLUSION

The proposed method is suitable for analyzing high-throughput proteomics data.

摘要

背景

SELDI-TOF 质谱等蛋白质组学技术的最新进展显示出在检测早期癌症方面的潜力。然而,在统计机器学习中,降维和分类是相当大的挑战。因此,我们提出了一种新的降维方法,并使用已发表的卵巢癌高分辨率 SELDI-TOF 数据对其进行了测试。

结果

我们提出了一种基于统计矩的方法来降低特征维度。经过精炼和 t 检验后,将 SELDI-TOF 数据分为几个区间。为每个区间计算四个统计矩(均值、方差、偏度和峰度),并用作代表变量。因此,可以快速降低数据的高维性。为了提高效率和分类性能,进一步将数据用于核 PLS 模型。该方法在 100 次五重交叉验证中实现了平均灵敏度为 0.9950、特异性为 0.9916、准确性为 0.9935 和相关系数为 0.9869。此外,在留一法交叉验证中只有一个对照被错误分类。

结论

所提出的方法适用于分析高通量蛋白质组学数据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验