文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

临床蛋白质组学中生物标志物发现的特征选择方法的批判性评估。

A critical assessment of feature selection methods for biomarker discovery in clinical proteomics.

机构信息

Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands.

出版信息

Mol Cell Proteomics. 2013 Jan;12(1):263-76. doi: 10.1074/mcp.M112.022566. Epub 2012 Oct 31.


DOI:10.1074/mcp.M112.022566
PMID:23115301
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3536906/
Abstract

In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)-using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low.

摘要

在本文中,我们比较了六种不同的特征选择方法在基于 LC-MS 的蛋白质组学和代谢组学生物标志物发现中的性能——t 检验、Mann-Whitney-Wilcoxon 检验(mww 检验)、最近收缩中心(NSC)、线性支持向量机递归特征消除(SVM-RFE)、主成分判别分析(PCDA)和偏最小二乘判别分析(PLSDA)——使用人类尿液和猪脑脊液样本,这些样本中加入了一系列不同浓度的肽。理想的特征选择方法应该选择与加标肽相关的完整鉴别特征集,而不选择不相关的特征。虽然许多研究都依赖于分类错误来判断所选生物标志物候选物的可靠性,但我们直接从加标肽的列表中评估选择的准确性。特征选择方法应用于不同样本大小和样本类别分离程度的数据集中,这些程度由加标化合物的浓度水平决定。对于每种特征选择方法和数据集,使用召回率和精度(f 分数)的调和平均值以及召回率和真阴性率(g 分数)的几何平均值来评估选择与加标化合物相关的特征集的性能。我们得出的结论是,单变量 t 检验和 mww 检验与多重检验校正不适用于样本量较小(n=6)的数据集,但随着样本量的增加,其性能显著提高,直到某个点(n>12),它们的性能优于其他方法。PCDA 和 PLSDA 选择具有高精度的小特征集,但错过了与加标肽相关的许多真阳性特征。NSC 在不依赖加标水平和样本数量的情况下,为所有数据集在召回率和精度之间取得了合理的平衡。线性 SVM-RFE 在选择与加标化合物相关的特征方面表现不佳,尽管分类错误相对较低。

相似文献

[1]
A critical assessment of feature selection methods for biomarker discovery in clinical proteomics.

Mol Cell Proteomics. 2012-10-31

[2]
Feature selection and nearest centroid classification for protein mass spectrometry.

BMC Bioinformatics. 2005-3-23

[3]
Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data.

BMC Bioinformatics. 2006-4-10

[4]
Computational advances of tumor marker selection and sample classification in cancer proteomics.

Comput Struct Biotechnol J. 2020-7-17

[5]
Laplacian linear discriminant analysis approach to unsupervised feature selection.

IEEE/ACM Trans Comput Biol Bioinform. 2009

[6]
Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.

BMC Bioinformatics. 2006-12-25

[7]
Targeted Feature Detection for Data-Dependent Shotgun Proteomics.

J Proteome Res. 2017-7-19

[8]
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

2015

[9]
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.

BMC Complement Altern Med. 2012-8-16

[10]
NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.

Bioinformatics. 2015-6-18

引用本文的文献

[1]
Molecular Determinants in Seminal Plasma and Spermatozoa: Nontargeted Metabolomics.

Methods Mol Biol. 2025

[2]
Identification and Analysis of Potential Biomarkers Associated with Neutrophil Extracellular Traps in Cervicitis.

Biochem Genet. 2024-10-17

[3]
In Silico Modeling of Fabry Disease Pathophysiology for the Identification of Early Cellular Damage Biomarker Candidates.

Int J Mol Sci. 2024-9-25

[4]
Machine Learning-Driven Biomarker Discovery for Skeletal Complications in Type 1 Gaucher Disease Patients.

Int J Mol Sci. 2024-8-6

[5]
Novel authentication of African geographical coffee types (bean, roasted, powdered) by handheld NIR spectroscopic method.

Heliyon. 2024-7-31

[6]
Unlocking Preclinical Alzheimer's: A Multi-Year Label-Free In Vitro Raman Spectroscopy Study Empowered by Chemometrics.

Int J Mol Sci. 2024-4-26

[7]
MetaAnalyst: a user-friendly tool for metagenomic biomarker detection and phenotype classification.

BMC Med Res Methodol. 2022-12-28

[8]
Bioinformatics tools and data resources for assay development of fluid protein biomarkers.

Biomark Res. 2022-11-15

[9]
Identification of key candidate genes for IgA nephropathy using machine learning and statistics based bioinformatics models.

Sci Rep. 2022-8-17

[10]
Data mining analyses for precision medicine in acromegaly: a proof of concept.

Sci Rep. 2022-5-28

本文引用的文献

[1]
Assessing the performance of statistical validation tools for megavariate metabolomics data.

Metabolomics. 2006

[2]
msCompare: a framework for quantitative analysis of label-free LC-MS data for comparative candidate biomarker studies.

Mol Cell Proteomics. 2012-2-7

[3]
Threshold-avoiding proteomics pipeline.

Anal Chem. 2011-9-22

[4]
Recommendations for biomarker identification and qualification in clinical proteomics.

Sci Transl Med. 2010-8-25

[5]
Metabolic fingerprints of proliferative diabetic retinopathy: an 1H-NMR-based metabonomic approach using vitreous humor.

Invest Ophthalmol Vis Sci. 2010-4-7

[6]
Multivariate paired data analysis: multilevel PLSDA versus OPLSDA.

Metabolomics. 2010-3

[7]
Metabolic classification of South American Ilex species by NMR-based metabolomics.

Phytochemistry. 2010-3-2

[8]
Metabolomic study of myocardial ischemia and intervention effects of Compound Danshen Tablets in rats using ultra-performance liquid chromatography/quadrupole time-of-flight mass spectrometry.

J Pharm Biomed Anal. 2009-12-29

[9]
Metabonomics study of urine from Sprague-Dawley rats exposed to Huang-yao-zi using (1)H NMR spectroscopy.

J Pharm Biomed Anal. 2009-12-29

[10]
Simple quality assessment approach for herbal extracts using high performance liquid chromatography-UV based metabolomics platform.

J Chromatogr A. 2009-12-16

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索