Suppr超能文献

基于二元判别分析的质谱数据差异蛋白质表达和峰选择。

Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis.

机构信息

Anesthesiology and Intensive Care Medicine, University Hospital Greifswald, Ferdinand-Sauerbruch-Straße, D-17475 Greifswald, Germany and.

Epidemiology and Biostatistics, School of Public Health, Imperial College London, Norfolk Place, London, W2 1PG, UK.

出版信息

Bioinformatics. 2015 Oct 1;31(19):3156-62. doi: 10.1093/bioinformatics/btv334. Epub 2015 May 28.

Abstract

MOTIVATION

Proteomic mass spectrometry analysis is becoming routine in clinical diagnostics, for example to monitor cancer biomarkers using blood samples. However, differential proteomics and identification of peaks relevant for class separation remains challenging.

RESULTS

Here, we introduce a simple yet effective approach for identifying differentially expressed proteins using binary discriminant analysis. This approach works by data-adaptive thresholding of protein expression values and subsequent ranking of the dichotomized features using a relative entropy measure. Our framework may be viewed as a generalization of the 'peak probability contrast' approach of Tibshirani et al. (2004) and can be applied both in the two-group and the multi-group setting. Our approach is computationally inexpensive and shows in the analysis of a large-scale drug discovery test dataset equivalent prediction accuracy as a random forest. Furthermore, we were able to identify in the analysis of mass spectrometry data from a pancreas cancer study biological relevant and statistically predictive marker peaks unrecognized in the original study.

AVAILABILITY AND IMPLEMENTATION

The methodology for binary discriminant analysis is implemented in the R package binda, which is freely available under the GNU General Public License (version 3 or later) from CRAN at URL http://cran.r-project.org/web/packages/binda/. R scripts reproducing all described analyzes are available from the web page http://strimmerlab.org/software/binda/.

CONTACT

k.strimmer@imperial.ac.uk.

摘要

动机

蛋白质组学质谱分析在临床诊断中已成为常规,例如使用血液样本监测癌症生物标志物。然而,差异蛋白质组学和鉴定与分类分离相关的峰仍然具有挑战性。

结果

在这里,我们介绍了一种使用二元判别分析识别差异表达蛋白的简单而有效的方法。该方法通过对蛋白表达值进行数据自适应阈值处理,并使用相对熵度量对二分类特征进行排序,从而实现对差异表达蛋白的识别。我们的方法可以看作是 Tibshirani 等人(2004 年)提出的“峰概率对比”方法的推广,可以应用于两组和多组情况。我们的方法计算成本低,在对大规模药物发现测试数据集的分析中,其预测准确性与随机森林相当。此外,我们还能够在胰腺癌细胞研究的质谱数据分析中,识别出在原始研究中未被识别的生物学相关和统计学上具有预测性的标记峰。

可用性和实施

二元判别分析的方法在 R 包 binda 中实现,该包可在 GNU 通用公共许可证(版本 3 或更高版本)下从 CRAN 网址 http://cran.r-project.org/web/packages/binda/ 免费获得。重现所有描述性分析的 R 脚本可从网页 http://strimmerlab.org/software/binda/ 获得。

联系方式

k.strimmer@imperial.ac.uk

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验