Suppr超能文献

普通脱硫弧菌转录组和蛋白质组数据的综合分析:用于预测未检测到的蛋白质丰度的零膨胀泊松回归模型

Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: zero-inflated Poisson regression models to predict abundance of undetected proteins.

作者信息

Nie Lei, Wu Gang, Brockman Fred J, Zhang Weiwen

机构信息

Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Washington DC 20057, USA.

出版信息

Bioinformatics. 2006 Jul 1;22(13):1641-7. doi: 10.1093/bioinformatics/btl134. Epub 2006 May 4.

Abstract

MOTIVATION

Integrated analysis of global scale transcriptomic and proteomic data can provide important insights into the metabolic mechanisms underlying complex biological systems. However, because the relationship between protein abundance and mRNA expression level is complicated by many cellular and physical processes, sophisticated statistical models need to be developed to capture their relationship.

RESULTS

In this study, we describe a novel data-driven statistical model to integrate whole-genome microarray and proteomic data collected from Desulfovibrio vulgaris grown under three different conditions. Based on the Poisson distribution pattern of proteomic data and the fact that a large number of proteins were undetected (excess zeros), zero-inflated Poisson (ZIP)-based models were proposed to define the correlation pattern between mRNA and protein abundance. In addition, by assuming that there is a probability mass at zero representing unexpressed genes and expressed proteins that were undetected owing to technical limitations, a Potential ZIP model was established. Two significant improvements introduced by this approach are (1) the predicted protein abundance level values for experimentally detected proteins are corrected by considering their mRNA levels and (2) protein abundance values can be predicted for undetected proteins (in the case of this study, approximately 83% of the proteins in the D.vulgaris genome) for better biological interpretation. We demonstrated the use of these statistical models by comparatively analyzing proteomic and microarray results from D.vulgaris grown on lactate-based versus formate-based media. These models correctly predicted increased expression of Ech hydrogenase and decreased expression of Coo hydrogenase for D.vulgaris grown on formate.

摘要

动机

对全球规模的转录组学和蛋白质组学数据进行综合分析,可以为复杂生物系统潜在的代谢机制提供重要见解。然而,由于蛋白质丰度与mRNA表达水平之间的关系受到许多细胞和物理过程的影响而变得复杂,因此需要开发复杂的统计模型来捕捉它们之间的关系。

结果

在本研究中,我们描述了一种新型的数据驱动统计模型,用于整合从在三种不同条件下生长的普通脱硫弧菌收集的全基因组微阵列和蛋白质组学数据。基于蛋白质组学数据的泊松分布模式以及大量蛋白质未被检测到(过多零值)这一事实,提出了基于零膨胀泊松(ZIP)的模型来定义mRNA与蛋白质丰度之间的相关模式。此外,通过假设在零处存在一个概率质量,代表由于技术限制未被检测到的未表达基因和已表达蛋白质,建立了潜在ZIP模型。该方法引入的两个显著改进是:(1)通过考虑其mRNA水平对实验检测到的蛋白质的预测蛋白质丰度水平值进行校正;(2)可以预测未检测到的蛋白质的蛋白质丰度值(在本研究中,普通脱硫弧菌基因组中约83%的蛋白质),以便进行更好的生物学解释。我们通过比较分析在基于乳酸盐和基于甲酸盐的培养基上生长的普通脱硫弧菌的蛋白质组学和微阵列结果,展示了这些统计模型的应用。这些模型正确地预测了在甲酸盐上生长的普通脱硫弧菌中Ech氢化酶表达增加和Coo氢化酶表达减少。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验