基于尺度空间的无监督特征选择在卵巢癌检测中用于质谱分类。

A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection.

机构信息

Department of Biological and Environmental Sciences, University of Sannio, Via Port'Arsa 11, Benevento, Italy.

出版信息

BMC Bioinformatics. 2009 Oct 15;10 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2105-10-S12-S9.

DOI:10.1186/1471-2105-10-S12-S9

PMID:19828085

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2762074/

Abstract

BACKGROUND

Mass spectrometry spectra, widely used in proteomics studies as a screening tool for protein profiling and to detect discriminatory signals, are high dimensional data. A large number of local maxima (a.k.a. peaks) have to be analyzed as part of computational pipelines aimed at the realization of efficient predictive and screening protocols. With this kind of data dimensions and samples size the risk of over-fitting and selection bias is pervasive. Therefore the development of bio-informatics methods based on unsupervised feature extraction can lead to general tools which can be applied to several fields of predictive proteomics.

RESULTS

We propose a method for feature selection and extraction grounded on the theory of multi-scale spaces for high resolution spectra derived from analysis of serum. Then we use support vector machines for classification. In particular we use a database containing 216 samples spectra divided in 115 cancer and 91 control samples. The overall accuracy averaged over a large cross validation study is 98.18. The area under the ROC curve of the best selected model is 0.9962.

CONCLUSION

We improved previous known results on the problem on the same data, with the advantage that the proposed method has an unsupervised feature selection phase. All the developed code, as MATLAB scripts, can be downloaded from http://medeaserver.isa.cnr.it/dacierno/spectracode.htm.

摘要

背景

质谱光谱广泛应用于蛋白质组学研究，作为蛋白质分析和检测判别信号的筛选工具，它是一种高维数据。在旨在实现高效预测和筛选方案的计算管道中，必须分析大量的局部最大值（也称为峰）。由于数据维度和样本数量庞大，过拟合和选择偏差的风险普遍存在。因此，基于无监督特征提取的生物信息学方法的发展可以带来通用工具，可应用于预测蛋白质组学的多个领域。

结果

我们提出了一种基于多尺度空间理论的特征选择和提取方法，用于分析血清得到的高分辨率光谱。然后，我们使用支持向量机进行分类。特别是，我们使用包含 216 个样本光谱的数据库，其中 115 个为癌症样本，91 个为对照样本。在大规模交叉验证研究中，平均整体准确率为 98.18%。最佳选择模型的 ROC 曲线下面积为 0.9962。

结论

我们在同一数据上改进了先前已知的结果，并且我们的方法具有无监督的特征选择阶段。所有开发的代码，作为 MATLAB 脚本，可以从 http://medeaserver.isa.cnr.it/dacierno/spectracode.htm 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6021/2762074/f23bbc09ce59/1471-2105-10-S12-S9-1.jpg

相似文献

A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection.基于尺度空间的无监督特征选择在卵巢癌检测中用于质谱分类。

BMC Bioinformatics. 2009 Oct 15;10 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2105-10-S12-S9.

Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines.基于支持向量机的代谢组学液相色谱/质谱数据分析卵巢癌检测。

BMC Bioinformatics. 2009 Aug 22;10:259. doi: 10.1186/1471-2105-10-259.

Comparison of feature selection and classification for MALDI-MS data.基质辅助激光解吸电离飞行时间质谱（MALDI-MS）数据的特征选择与分类比较

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2164-10-S1-S3.

On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve.通过正则化ROC曲线下面积对糖组学质谱数据进行分析。

BMC Bioinformatics. 2007 Dec 12;8:477. doi: 10.1186/1471-2105-8-477.

Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery.基于化学计量学的特征选择方法在早期癌症检测和生物标志物发现中的稳健性。

Stat Appl Genet Mol Biol. 2013 Mar 13;12(2):207-23. doi: 10.1515/sagmb-2012-0067.

Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data.基于贝叶斯神经网络的从高分辨率质谱数据中识别卵巢癌的方法。

Bioinformatics. 2005 Jun;21 Suppl 1:i487-94. doi: 10.1093/bioinformatics/bti1030.

Peak selection from MALDI-TOF mass spectra using ant colony optimization.使用蚁群优化算法从基质辅助激光解吸电离飞行时间质谱（MALDI-TOF MS）中进行峰选择。

Bioinformatics. 2007 Mar 1;23(5):619-26. doi: 10.1093/bioinformatics/btl678. Epub 2007 Jan 19.

Classification algorithms for phenotype prediction in genomics and proteomics.基因组学和蛋白质组学中用于表型预测的分类算法。

Front Biosci. 2008 Jan 1;13:691-708. doi: 10.2741/2712.

Machine learning methods for predictive proteomics.用于预测蛋白质组学的机器学习方法。

Brief Bioinform. 2008 Mar;9(2):119-28. doi: 10.1093/bib/bbn008. Epub 2008 Feb 29.

引用本文的文献

SMoFFI-SegFormer: a novel approach for ovarian tumor segmentation based on an improved SegFormer architecture.SMoFFI-SegFormer：一种基于改进的SegFormer架构的卵巢肿瘤分割新方法。

Front Oncol. 2025 Jul 21;15:1555585. doi: 10.3389/fonc.2025.1555585. eCollection 2025.

Intelligence Algorithms for Protein Classification by Mass Spectrometry.基于质谱的蛋白质分类智能算法。

Biomed Res Int. 2018 Nov 11;2018:2862458. doi: 10.1155/2018/2862458. eCollection 2018.

本文引用的文献

Machine learning methods for predictive proteomics.用于预测蛋白质组学的机器学习方法。

Brief Bioinform. 2008 Mar;9(2):119-28. doi: 10.1093/bib/bbn008. Epub 2008 Feb 29.

Improved model-based, platform-independent feature extraction for mass spectrometry.用于质谱分析的基于模型的、与平台无关的改进型特征提取方法。

Bioinformatics. 2007 Oct 1;23(19):2528-35. doi: 10.1093/bioinformatics/btm385. Epub 2007 Aug 13.

Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data.用于质谱和微阵列数据的递归支持向量机特征选择与样本分类

BMC Bioinformatics. 2006 Apr 10;7:197. doi: 10.1186/1471-2105-7-197.

Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data.基于高通量质谱数据降维的卵巢癌识别

Bioinformatics. 2005 May 15;21(10):2200-9. doi: 10.1093/bioinformatics/bti370. Epub 2005 Mar 22.

Sample classification from protein mass spectrometry, by 'peak probability contrasts'.通过“峰概率对比”对蛋白质质谱样本进行分类。

Bioinformatics. 2004 Nov 22;20(17):3034-44. doi: 10.1093/bioinformatics/bth357. Epub 2004 Jun 29.

High-resolution serum proteomic features for ovarian cancer detection.用于卵巢癌检测的高分辨率血清蛋白质组学特征

Endocr Relat Cancer. 2004 Jun;11(2):163-78. doi: 10.1677/erc.0.0110163.

Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum.基于人血清质谱的表达依赖性蛋白质组学数据的概率性疾病分类

J Comput Biol. 2003;10(6):925-46. doi: 10.1089/106652703322756159.

Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments.血清中表面增强激光解吸电离飞行时间质谱蛋白质谱图的可重复性：比较不同实验的数据集

Bioinformatics. 2004 Mar 22;20(5):777-85. doi: 10.1093/bioinformatics/btg484. Epub 2004 Jan 29.

Entropy-based gene ranking without selection bias for the predictive classification of microarray data.基于熵的基因排序，无选择偏差用于微阵列数据的预测分类

BMC Bioinformatics. 2003 Nov 6;4:54. doi: 10.1186/1471-2105-4-54.

Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data.使用质谱数据进行卵巢癌分类的统计方法比较

Bioinformatics. 2003 Sep 1;19(13):1636-43. doi: 10.1093/bioinformatics/btg210.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于尺度空间的无监督特征选择在卵巢癌检测中用于质谱分类。

A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献