贝叶斯网络方法在质谱数据特征选择中的应用。

A Bayesian network approach to feature selection in mass spectrometry data.

机构信息

Department of Physics, The College of William and Mary, Williamsburg, VA, USA.

出版信息

BMC Bioinformatics. 2010 Apr 8;11:177. doi: 10.1186/1471-2105-11-177.

DOI:10.1186/1471-2105-11-177

PMID:20377906

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3098056/

Abstract

BACKGROUND

Time-of-flight mass spectrometry (TOF-MS) has the potential to provide non-invasive, high-throughput screening for cancers and other serious diseases via detection of protein biomarkers in blood or other accessible biologic samples. Unfortunately, this potential has largely been unrealized to date due to the high variability of measurements, uncertainties in the distribution of proteins in a given population, and the difficulty of extracting repeatable diagnostic markers using current statistical tools. With studies consisting of perhaps only dozens of samples, and possibly hundreds of variables, overfitting is a serious complication. To overcome these difficulties, we have developed a Bayesian inductive method which uses model-independent methods of discovering relationships between spectral features. This method appears to efficiently discover network models which not only identify connections between the disease and key features, but also organizes relationships between features--and furthermore creates a stable classifier that categorizes new data at predicted error rates.

RESULTS

The method was applied to artificial data with known feature relationships and typical TOF-MS variability introduced, and was able to recover those relationships nearly perfectly. It was also applied to blood sera data from a 2004 leukemia study, and showed high stability of selected features under cross-validation. Verification of results using withheld data showed excellent predictive power. The method showed improvement over traditional techniques, and naturally incorporated measurement uncertainties. The relationships discovered between features allowed preliminary identification of a protein biomarker which was consistent with other cancer studies and later verified experimentally.

CONCLUSIONS

This method appears to avoid overfitting in biologic data and produce stable feature sets in a network model. The network structure provides additional information about the relationships among features that is useful to guide further biochemical analysis. In addition, when used to classify new data, these feature sets are far more consistent than those produced by many traditional techniques.

摘要

背景

飞行时间质谱（TOF-MS）有可能通过检测血液或其他可及的生物样本中的蛋白质生物标志物，提供非侵入性、高通量的癌症和其他严重疾病筛查。不幸的是，由于测量的高度可变性、给定人群中蛋白质分布的不确定性以及使用当前统计工具提取可重复诊断标志物的困难，这一潜力迄今在很大程度上尚未实现。由于研究样本数可能只有几十例，甚至数百例，并且可能有数百个变量，因此过拟合是一个严重的问题。为了克服这些困难，我们开发了一种贝叶斯归纳方法，该方法使用独立于模型的方法来发现光谱特征之间的关系。该方法似乎能够有效地发现网络模型，这些模型不仅可以识别疾病与关键特征之间的联系，还可以组织特征之间的关系，并且还可以创建一个稳定的分类器，以预测错误率对新数据进行分类。

结果

该方法应用于具有已知特征关系和典型 TOF-MS 变异性的人工数据，几乎可以完美地恢复这些关系。它还应用于 2004 年白血病研究的血液血清数据，并且在交叉验证下显示出所选特征的高稳定性。使用保留数据进行结果验证表明具有出色的预测能力。该方法显示出优于传统技术的改进，并且自然地包含了测量不确定性。特征之间发现的关系允许初步确定与其他癌症研究一致的蛋白质生物标志物，并随后通过实验验证。

结论

该方法似乎避免了生物数据中的过拟合，并在网络模型中产生了稳定的特征集。网络结构提供了有关特征之间关系的附加信息，这对于指导进一步的生化分析很有用。此外，当用于对新数据进行分类时，这些特征集比许多传统技术生成的特征集更一致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7d1/3098056/1a973b0bd48d/1471-2105-11-177-1.jpg

相似文献

A Bayesian network approach to feature selection in mass spectrometry data.贝叶斯网络方法在质谱数据特征选择中的应用。

BMC Bioinformatics. 2010 Apr 8;11:177. doi: 10.1186/1471-2105-11-177.

Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

Feature selection and machine learning with mass spectrometry data.基于质谱数据的特征选择与机器学习

Methods Mol Biol. 2013;1007:237-62. doi: 10.1007/978-1-62703-392-3_10.

Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines.基于支持向量机的代谢组学液相色谱/质谱数据分析卵巢癌检测。

BMC Bioinformatics. 2009 Aug 22;10:259. doi: 10.1186/1471-2105-10-259.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Identification of candidate serum biomarkers of childhood-onset growth hormone deficiency using SWATH-MS and feature selection.应用 SWATH-MS 和特征选择鉴定儿童期起病生长激素缺乏症的候选血清生物标志物。

J Proteomics. 2018 Mar 20;175:105-113. doi: 10.1016/j.jprot.2018.01.003. Epub 2018 Jan 6.

Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.临床神经科学中的功能基因组学和蛋白质组学：数据挖掘与生物信息学

Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5.

Machine learning methods for predictive proteomics.用于预测蛋白质组学的机器学习方法。

Brief Bioinform. 2008 Mar;9(2):119-28. doi: 10.1093/bib/bbn008. Epub 2008 Feb 29.

Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery.基于化学计量学的特征选择方法在早期癌症检测和生物标志物发现中的稳健性。

Stat Appl Genet Mol Biol. 2013 Mar 13;12(2):207-23. doi: 10.1515/sagmb-2012-0067.

SELDI-TOF mass spectrometry for cancer biomarker discovery and serum proteomic diagnostics.用于癌症生物标志物发现和血清蛋白质组学诊断的表面增强激光解吸电离飞行时间质谱技术

Pharmacogenomics. 2005 Sep;6(6):647-53. doi: 10.2217/14622416.6.6.647.

引用本文的文献

Comprehensive Overview of Bottom-Up Proteomics Using Mass Spectrometry.基于质谱的自下而上蛋白质组学综合概述

ACS Meas Sci Au. 2024 Jun 4;4(4):338-417. doi: 10.1021/acsmeasuresciau.3c00068. eCollection 2024 Aug 21.

Bayesian Networks for Prescreening in Depression: Algorithm Development and Validation.贝叶斯网络在抑郁症预筛中的应用：算法开发与验证。

JMIR Ment Health. 2024 Jul 4;11:e52045. doi: 10.2196/52045.

Comprehensive Overview of Bottom-Up Proteomics using Mass Spectrometry.基于质谱的自下而上蛋白质组学综合概述

ArXiv. 2023 Nov 13:arXiv:2311.07791v1.

Challenges and Opportunities for Bayesian Statistics in Proteomics.贝叶斯统计学在蛋白质组学中的挑战与机遇。

J Proteome Res. 2022 Apr 1;21(4):849-864. doi: 10.1021/acs.jproteome.1c00859. Epub 2022 Mar 8.

Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries.通过三个国家的分布式学习开发并验证非小细胞肺癌患者的生存预测模型

Int J Radiat Oncol Biol Phys. 2017 Oct 1;99(2):344-352. doi: 10.1016/j.ijrobp.2017.04.021. Epub 2017 Apr 24.

Bayesian modeling suggests that IL-12 (p40), IL-13 and MCP-1 drive murine cytokine networks in vivo.贝叶斯模型表明，白细胞介素-12（p40）、白细胞介素-13和单核细胞趋化蛋白-1在体内驱动小鼠细胞因子网络。

BMC Syst Biol. 2015 Nov 9;9:76. doi: 10.1186/s12918-015-0226-3.

A tree-like Bayesian structure learning algorithm for small-sample datasets from complex biological model systems.一种用于来自复杂生物模型系统的小样本数据集的树状贝叶斯结构学习算法。

BMC Syst Biol. 2015 Aug 28;9:49. doi: 10.1186/s12918-015-0194-7.

Bayesian networks for clinical decision support in lung cancer care.贝叶斯网络在肺癌护理中的临床决策支持。

PLoS One. 2013 Dec 6;8(12):e82349. doi: 10.1371/journal.pone.0082349. eCollection 2013.

Improved signal processing and normalization for biomarker protein detection in broad-mass-range TOF mass spectra from clinical samples.改进生物标志物蛋白在临床样本宽质量范围飞行时间质谱中的信号处理和归一化。

Proteomics Clin Appl. 2011 Aug;5(7-8):440-7. doi: 10.1002/prca.201000095. Epub 2011 Jul 13.

A Bayesian network approach for modeling local failure in lung cancer.贝叶斯网络方法在肺癌局部失败建模中的应用。

Phys Med Biol. 2011 Mar 21;56(6):1635-51. doi: 10.1088/0031-9155/56/6/008. Epub 2011 Feb 18.

本文引用的文献

Precision enhancement of MALDI-TOF MS using high resolution peak detection and label-free alignment.使用高分辨率峰检测和无标记比对提高基质辅助激光解吸电离飞行时间质谱的精度

Proteomics. 2008 Apr;8(8):1530-8. doi: 10.1002/pmic.200701146.

A primer on learning in Bayesian networks for computational biology.计算生物学中贝叶斯网络学习入门

PLoS Comput Biol. 2007 Aug;3(8):e129. doi: 10.1371/journal.pcbi.0030129.

The MALDI-TOF mass spectrometric view of the plasma proteome and peptidome.血浆蛋白质组和肽组的基质辅助激光解吸电离飞行时间质谱图。

Clin Chem. 2006 Jul;52(7):1223-37. doi: 10.1373/clinchem.2006.069252. Epub 2006 Apr 27.

Resampling and deconvolution of linear time-of-flight records for enhanced protein profiling.用于增强蛋白质谱分析的线性飞行时间记录的重采样与反卷积

Rapid Commun Mass Spectrom. 2006;20(11):1670-8. doi: 10.1002/rcm.2496.

Discrete serum protein signatures discriminate between human retrovirus-associated hematologic and neurologic disease.离散血清蛋白特征可区分人类逆转录病毒相关血液学和神经疾病。

Leukemia. 2005 Jul;19(7):1229-38. doi: 10.1038/sj.leu.2403781.

Prediction of cancer outcome with microarrays: a multiple random validation strategy.利用微阵列预测癌症预后：一种多重随机验证策略。

Lancet. 2005;365(9458):488-92. doi: 10.1016/S0140-6736(05)17866-0.

Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques.使用时间序列分析技术提高血清肽表面增强激光解吸/电离飞行时间质谱记录的灵敏度和分辨率。

Clin Chem. 2005 Jan;51(1):65-74. doi: 10.1373/clinchem.2004.037283. Epub 2004 Nov 18.

Genetic associations: false or true?基因关联：是假还是真？

Trends Mol Med. 2003 Apr;9(4):135-8. doi: 10.1016/s1471-4914(03)00030-3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

贝叶斯网络方法在质谱数据特征选择中的应用。

A Bayesian network approach to feature selection in mass spectrometry data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献