蛋白质基因组学面临的挑战：以 DREAM 蛋白质基因组学子挑战为例的分析方法比较。

Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge.

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA.

Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.

出版信息

BMC Bioinformatics. 2019 Dec 20;20(Suppl 24):669. doi: 10.1186/s12859-019-3253-z.

DOI:10.1186/s12859-019-3253-z

PMID:31861998

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6923881/

Abstract

BACKGROUND

Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking.

RESULTS

We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches.

CONCLUSIONS

In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.

摘要

背景

蛋白质组学测量结果与表型密切相关，能够深入了解基因表达调控和表型改变的机制。此外，整合蛋白质组和转录组水平的数据可以验证与表型相关的基因特征。然而，蛋白质组学数据不像基因组学数据那样丰富，因此在缺乏匹配的蛋白质组学样本或样本内测量值时，使用基因组特征来预测蛋白质丰度是有益的。

结果

我们使用 2017 年 DREAM 蛋白质组学挑战赛的数据，评估和比较了四种基于数据驱动的模型，用于预测乳腺癌和卵巢癌中测量的 mRNA 的蛋白质组学数据。我们的结果表明，贝叶斯网络、随机森林、LASSO 和模糊逻辑方法可以预测蛋白质丰度水平，其中位数真实值-预测值相关性值在 0.2 到 0.5 之间。然而，最准确预测的蛋白质在不同方法之间存在显著差异。

结论

除了对预测转录水平的蛋白质水平的上述机器学习方法进行基准测试外，我们还讨论了当前蛋白质组学分析中的挑战和潜在解决方案。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

蛋白质基因组学面临的挑战：以 DREAM 蛋白质基因组学子挑战为例的分析方法比较。

Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

蛋白质基因组学面临的挑战：以 DREAM 蛋白质基因组学子挑战为例的分析方法比较。

Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献