• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在蛋白质组学光谱中寻找诊断生物标志物。

Finding diagnostic biomarkers in proteomic spectra.

作者信息

Pratapa Pallavi N, Patz Edward F, Hartemink Alexander J

机构信息

Duke University, Dept. of Computer Science, Box 90129, Durham, NC 27708, USA.

出版信息

Pac Symp Biocomput. 2006:279-90.

PMID:17094246
Abstract

In seeking to find diagnostic biomarkers in proteomic spectra, two significant problems arise. First, not only is there noise in the measured intensity at each m/z value, but there is also noise in the measured m/z value itself. Second, the potential for overfitting is severe: it is easy to find features in the spectra that accurately discriminate disease states but have no biological meaning. We address these problems by developing and testing a series of steps for pre-processing proteomic spectra and extracting putatively meaningful features before presentation to feature selection and classification algorithms. These steps include an HMM-based latent spectrum extraction algorithm for fusing the information from multiple replicate spectra obtained from a single tissue sample, a simple algorithm for baseline correction based on a segmented convex hull, a peak identification and quantification algorithm, and a peak registration algorithm to align peaks from multiple tissue samples into common peak registers. We apply these steps to MALDI spectral data collected from normal and tumor lung tissue samples, and then compare the performance of feature selection with FDR followed by classification with an SVM, versus joint feature selection and classification with Bayesian sparse multinomial logistic regression (SMLR). The SMLR approach outperformed FDR+SVM, but both were effective in achieving good diagnostic accuracy with a small number of features. Some of the selected features have previously been investigated as clinical markers for lung cancer diagnosis; some of the remaining features are excellent candidates for further research.

摘要

在试图从蛋白质组学光谱中寻找诊断生物标志物时,出现了两个重大问题。首先,不仅在每个质荷比(m/z)值处的测量强度存在噪声,而且在测量的质荷比值本身也存在噪声。其次,过拟合的可能性很大:很容易在光谱中找到能准确区分疾病状态但没有生物学意义的特征。我们通过开发和测试一系列用于预处理蛋白质组学光谱并在将其呈现给特征选择和分类算法之前提取假定有意义特征的步骤来解决这些问题。这些步骤包括一种基于隐马尔可夫模型(HMM)的潜在光谱提取算法,用于融合从单个组织样本获得的多个重复光谱的信息;一种基于分段凸包的简单基线校正算法;一种峰识别和定量算法;以及一种峰配准算法,用于将来自多个组织样本的峰对齐到共同的峰寄存器中。我们将这些步骤应用于从正常和肿瘤肺组织样本收集的基质辅助激光解吸电离(MALDI)光谱数据,然后比较先进行错误发现率(FDR)特征选择再用支持向量机(SVM)分类的性能,与使用贝叶斯稀疏多项式逻辑回归(SMLR)进行联合特征选择和分类的性能。SMLR方法优于FDR + SVM,但两者在使用少量特征实现良好诊断准确性方面都很有效。一些选定的特征此前已作为肺癌诊断的临床标志物进行过研究;其余一些特征是进一步研究的优秀候选对象。

相似文献

1
Finding diagnostic biomarkers in proteomic spectra.在蛋白质组学光谱中寻找诊断生物标志物。
Pac Symp Biocomput. 2006:279-90.
2
Peak selection from MALDI-TOF mass spectra using ant colony optimization.使用蚁群优化算法从基质辅助激光解吸电离飞行时间质谱(MALDI-TOF MS)中进行峰选择。
Bioinformatics. 2007 Mar 1;23(5):619-26. doi: 10.1093/bioinformatics/btl678. Epub 2007 Jan 19.
3
Identification of lung cancer patients by serum protein profiling using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry.使用表面增强激光解吸/电离飞行时间质谱法通过血清蛋白质谱分析鉴定肺癌患者
Am J Clin Oncol. 2008 Apr;31(2):133-9. doi: 10.1097/COC.0b013e318145b98b.
4
Proteomic data analysis workflow for discovery of candidate biomarker peaks predictive of clinical outcome for patients with acute myeloid leukemia.用于发现预测急性髓性白血病患者临床结局的候选生物标志物峰的蛋白质组学数据分析流程。
J Proteome Res. 2008 Jun;7(6):2332-41. doi: 10.1021/pr070482e. Epub 2008 May 2.
5
Feature selection and machine learning with mass spectrometry data.基于质谱数据的特征选择与机器学习
Methods Mol Biol. 2013;1007:237-62. doi: 10.1007/978-1-62703-392-3_10.
6
Application of the random forest classification method to peaks detected from mass spectrometric proteomic profiles of cancer patients and controls.将随机森林分类方法应用于从癌症患者和对照的质谱蛋白质组学图谱中检测到的峰。
Stat Appl Genet Mol Biol. 2008;7(2):Article4. doi: 10.2202/1544-6115.1349. Epub 2008 Feb 8.
7
Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by random forest.利用随机森林选择的重要特征进行线性判别分析,对来自质谱图谱的乳腺癌样本与正常样本进行分类。
Stat Appl Genet Mol Biol. 2008;7(2):Article7. doi: 10.2202/1544-6115.1345. Epub 2008 Feb 19.
8
Multispectra CWT-based algorithm (MCWT) in mass spectra for peak extraction.基于多光谱连续小波变换的算法(MCWT)用于质谱中的峰提取。
J Biopharm Stat. 2008;18(5):869-82. doi: 10.1080/10543400802278064.
9
Diagnosis of early relapse in ovarian cancer using serum proteomic profiling.利用血清蛋白质组分析诊断卵巢癌早期复发
Genome Inform. 2005;16(2):195-204.
10
An extended Markov blanket approach to proteomic biomarker detection from high-resolution mass spectrometry data.一种基于扩展马尔可夫毯方法从高分辨率质谱数据中检测蛋白质组学生物标志物。
IEEE Trans Inf Technol Biomed. 2009 Mar;13(2):195-206. doi: 10.1109/TITB.2008.2007909. Epub 2008 Dec 31.

引用本文的文献

1
Revealing metabolite biomarkers for acupuncture treatment by linear programming based feature selection.基于线性规划特征选择揭示针灸治疗的代谢物生物标志物。
BMC Syst Biol. 2012;6 Suppl 1(Suppl 1):S15. doi: 10.1186/1752-0509-6-S1-S15. Epub 2012 Jul 16.
2
Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset.基于多因素 MALDI-TOF MS T2DM 小鼠模型数据集的生物标志物发现和分类冗余减少。
BMC Bioinformatics. 2011 May 9;12:140. doi: 10.1186/1471-2105-12-140.