• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于SELDI-TOF-MS蛋白质组学图谱分类的特征选择

Feature Selection for Classification of SELDI-TOF-MS Proteomic Profiles.

作者信息

Hauskrecht Milos, Pelikan Richard, Malehorn David E, Bigbee William L, Lotze Michael T, Zeh Herbert J, Whitcomb David C, Lyons-Weiler James

机构信息

Department of Computer Science, University of Pittsburgh, Pittsburgh, Pennsylvania, USAUniversity of Pittsburgh Cancer Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.

出版信息

Appl Bioinformatics. 2005;4(4):227-46. doi: 10.2165/00822942-200504040-00003.

DOI:10.2165/00822942-200504040-00003
PMID:16309341
Abstract

BACKGROUND

Proteomic peptide profiling is an emerging technology harbouring great expectations to enable early detection, enhance diagnosis and more clearly define prognosis of many diseases. Although previous research work has illustrated the ability of proteomic data to discriminate between cases and controls, significantly less attention has been paid to the analysis of feature selection strategies that enable learning of such predictive models. Feature selection, in addition to classification, plays an important role in successful identification of proteomic biomarker panels.

METHODS

We present a new, efficient, multivariate feature selection strategy that extracts useful feature panels directly from the high-throughput spectra. The strategy takes advantage of the characteristics of surface-enhanced laser desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF-MS) profiles and enhances widely used univariate feature selection strategies with a heuristic based on multivariate de-correlation filtering. We analyse and compare two versions of the method: one in which all feature pairs must adhere to a maximum allowed correlation (MAC) threshold, and another in which the feature panel is built greedily by deciding among best univariate features at different MAC levels.

RESULTS

The analysis and comparison of feature selection strategies was carried out experimentally on the pancreatic cancer dataset with 57 cancers and 59 controls from the University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania, USA. The analysis was conducted in both the whole-profile and peak-only modes. The results clearly show the benefit of the new strategy over univariate feature selection methods in terms of improved classification performance.

CONCLUSION

Understanding the characteristics of the spectra allows us to better assess the relative importance of potential features in the diagnosis of cancer. Incorporation of these characteristics into feature selection strategies often leads to a more efficient data analysis as well as improved classification performance.

摘要

背景

蛋白质组学肽谱分析是一项新兴技术,人们对其寄予厚望,期望它能够实现多种疾病的早期检测、增强诊断并更清晰地界定预后。尽管先前的研究工作已经证明蛋白质组学数据能够区分病例和对照,但对于能够学习此类预测模型的特征选择策略的分析却少得多。除了分类之外,特征选择在成功识别蛋白质组学生物标志物组方面也起着重要作用。

方法

我们提出了一种新的、高效的多变量特征选择策略,该策略可直接从高通量光谱中提取有用的特征组。该策略利用了表面增强激光解吸/电离飞行时间质谱(SELDI-TOF-MS)图谱的特征,并通过基于多变量去相关滤波的启发式方法增强了广泛使用的单变量特征选择策略。我们分析并比较了该方法的两个版本:一个版本要求所有特征对必须符合最大允许相关性(MAC)阈值,另一个版本则通过在不同MAC水平下的最佳单变量特征中进行选择来贪婪地构建特征组。

结果

我们在美国宾夕法尼亚州匹兹堡大学癌症研究所的胰腺癌数据集上进行了实验,该数据集包含57例癌症患者和59例对照,对特征选择策略进行了分析和比较。分析在全图谱模式和仅峰值模式下进行。结果清楚地表明,新策略在分类性能方面优于单变量特征选择方法。

结论

了解光谱特征使我们能够更好地评估潜在特征在癌症诊断中的相对重要性。将这些特征纳入特征选择策略通常会带来更高效的数据分析以及更好的分类性能。

相似文献

1
Feature Selection for Classification of SELDI-TOF-MS Proteomic Profiles.用于SELDI-TOF-MS蛋白质组学图谱分类的特征选择
Appl Bioinformatics. 2005;4(4):227-46. doi: 10.2165/00822942-200504040-00003.
2
Pancreatic cancer biomarkers discovery by surface-enhanced laser desorption and ionization time-of-flight mass spectrometry.通过表面增强激光解吸电离飞行时间质谱法发现胰腺癌生物标志物
Clin Chem Lab Med. 2009;47(6):713-23. doi: 10.1515/CCLM.2009.158.
3
Identification of protein biomarkers in Dupuytren's contracture using surface enhanced laser desorption ionization time-of-flight mass spectrometry (SELDI-TOF-MS).使用表面增强激光解吸电离飞行时间质谱法(SELDI-TOF-MS)鉴定杜普伊特伦挛缩症中的蛋白质生物标志物。
Clin Invest Med. 2006 Jun;29(3):136-45.
4
A hybrid feature subset selection algorithm for analysis of high correlation proteomic data.一种用于分析高度相关蛋白质组学数据的混合特征子集选择算法。
J Med Signals Sens. 2012 Jul;2(3):161-8.
5
Quality control and quality assessment of data from surface-enhanced laser desorption/ionization (SELDI) time-of flight (TOF) mass spectrometry (MS).表面增强激光解吸/电离(SELDI)飞行时间(TOF)质谱(MS)数据的质量控制与质量评估。
BMC Bioinformatics. 2005 Jul 15;6 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-6-S2-S5.
6
A serum proteomic pattern for the detection of colorectal adenocarcinoma using surface enhanced laser desorption and ionization mass spectrometry.一种使用表面增强激光解吸电离质谱法检测结肠直肠癌的血清蛋白质组模式。
Cancer Invest. 2006 Dec;24(8):747-53. doi: 10.1080/07357900601063873.
7
Saliva analysis by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) in orthodontic treatment: first pilot study.表面增强激光解吸/电离飞行时间质谱法(SELDI-TOF-MS)在正畸治疗中的唾液分析:初步研究。
Prog Orthod. 2011 Nov;12(2):126-31. doi: 10.1016/j.pio.2011.06.002. Epub 2011 Jul 26.
8
Proteomic data analysis workflow for discovery of candidate biomarker peaks predictive of clinical outcome for patients with acute myeloid leukemia.用于发现预测急性髓性白血病患者临床结局的候选生物标志物峰的蛋白质组学数据分析流程。
J Proteome Res. 2008 Jun;7(6):2332-41. doi: 10.1021/pr070482e. Epub 2008 May 2.
9
Guilt-by-association feature selection: identifying biomarkers from proteomic profiles.基于关联的特征选择:从蛋白质组学图谱中识别生物标志物。
J Biomed Inform. 2008 Feb;41(1):124-36. doi: 10.1016/j.jbi.2007.04.003. Epub 2007 Apr 14.
10
Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类
BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

引用本文的文献

1
Change-point detection method for clinical decision support system rule monitoring.用于临床决策支持系统规则监控的变点检测方法。
Artif Intell Med. 2018 Sep;91:49-56. doi: 10.1016/j.artmed.2018.06.003. Epub 2018 Jul 3.
2
Biomarkers for pancreatic cancer: recent achievements in proteomics and genomics through classical and multivariate statistical methods.胰腺癌的生物标志物:通过经典和多变量统计方法在蛋白质组学和基因组学方面的最新成果。
World J Gastroenterol. 2014 Oct 7;20(37):13325-42. doi: 10.3748/wjg.v20.i37.13325.
3
Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data.
将多重假设检验和亲和传播聚类相结合,可以实现基因表达数据的准确、稳健和样本量独立分类。
BMC Bioinformatics. 2012 Oct 17;13:270. doi: 10.1186/1471-2105-13-270.
4
Identification of microbial and proteomic biomarkers in early childhood caries.幼儿龋齿中微生物和蛋白质组学生物标志物的鉴定
Int J Dent. 2011;2011:196721. doi: 10.1155/2011/196721. Epub 2011 Oct 16.
5
Inter-session reproducibility measures for high-throughput data sources.高通量数据源的会话间可重复性测量
Summit Transl Bioinform. 2008 Mar 1;2008:41-5.
6
Automatic selection of preprocessing methods for improving predictions on mass spectrometry protein profiles.自动选择预处理方法以改善对质谱蛋白质谱的预测。
AMIA Annu Symp Proc. 2010 Nov 13;2010:632-6.
7
Measuring stability of feature selection in biomedical datasets.衡量生物医学数据集中特征选择的稳定性。
AMIA Annu Symp Proc. 2009 Nov 14;2009:406-10.
8
Ovarian cancer classification based on dimensionality reduction for SELDI-TOF data.基于 SELDI-TOF 数据降维的卵巢癌分类。
BMC Bioinformatics. 2010 Feb 27;11:109. doi: 10.1186/1471-2105-11-109.
9
Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery.基于非负主成分分析的血清质谱轮廓研究和生物标志物发现。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-11-S1-S1.
10
Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer.整合微阵列数据、稳健特征选择及预测前列腺癌预后
Cancer Inform. 2007 Feb 14;2:87-97.