• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

脱硫弧菌转录组和蛋白质组数据的综合分析:预测未检测到蛋白质丰度的非线性模型。

Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins.

机构信息

Department of Industrial, Systems and Operations Engineering, Tempe, AZ 85287-5906, USA.

出版信息

Bioinformatics. 2009 Aug 1;25(15):1905-14. doi: 10.1093/bioinformatics/btp325. Epub 2009 May 15.

DOI:10.1093/bioinformatics/btp325
PMID:19447782
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2712339/
Abstract

MOTIVATION

Gene expression profiling technologies can generally produce mRNA abundance data for all genes in a genome. A dearth of proteomic data persists because identification range and sensitivity of proteomic measurements lag behind those of transcriptomic measurements. Using partial proteomic data, it is likely that integrative transcriptomic and proteomic analysis may introduce significant bias. Developing methodologies to accurately estimate missing proteomic data will allow better integration of transcriptomic and proteomic datasets and provide deeper insight into metabolic mechanisms underlying complex biological systems.

RESULTS

In this study, we present a non-linear data-driven model to predict abundance for undetected proteins using two independent datasets of cognate transcriptomic and proteomic data collected from Desulfovibrio vulgaris. We use stochastic gradient boosted trees (GBT) to uncover possible non-linear relationships between transcriptomic and proteomic data, and to predict protein abundance for the proteins not experimentally detected based on relevant predictors such as mRNA abundance, cellular role, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. Initially, we constructed a GBT model using all possible variables to assess their relative importance and characterize the behavior of the predictive model. A strong plateau effect in the regions of high mRNA values and sparse data occurred in this model. Hence, we removed genes in those areas based on thresholds estimated from the partial dependency plots where this behavior was captured. At this stage, only the strongest predictors of protein abundance were retained to reduce the complexity of the GBT model. After removing genes in the plateau region, mRNA abundance, main cellular functional categories and few triple codon counts emerged as the top-ranked predictors of protein abundance. We then created a new tuned GBT model using the five most significant predictors. The construction of our non-linear model consists of a set of serial regression trees models with implicit strength in variable selection. The model provides variable relative importance measures using as a criterion mean square error. The results showed that coefficients of determination for our nonlinear models ranged from 0.393 to 0.582 in both datasets, providing better results than linear regression used in the past. We evaluated the validity of this non-linear model using biological information of operons, regulons and pathways, and the results demonstrated that the coefficients of variation of estimated protein abundance values within operons, regulons or pathways are indeed smaller than those for random groups of proteins.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因表达谱技术通常可以产生基因组中所有基因的 mRNA 丰度数据。由于蛋白质组学测量的识别范围和灵敏度落后于转录组学测量,因此仍然缺乏蛋白质组学数据。使用部分蛋白质组学数据,整合转录组学和蛋白质组学分析可能会引入显著的偏差。开发准确估计缺失蛋白质组学数据的方法将允许更好地整合转录组学和蛋白质组学数据集,并深入了解复杂生物系统的代谢机制。

结果

在这项研究中,我们提出了一种非线性数据驱动模型,使用从脱硫弧菌中收集的两个独立的同源转录组学和蛋白质组学数据集来预测未检测到的蛋白质的丰度。我们使用随机梯度增强树(GBT)来揭示转录组学和蛋白质组学数据之间可能存在的非线性关系,并根据相关预测因子(如 mRNA 丰度、细胞角色、分子量、序列长度、蛋白质长度、鸟嘌呤-胞嘧啶(GC)含量和三密码子计数)预测未实验检测到的蛋白质的丰度。最初,我们使用所有可能的变量构建了一个 GBT 模型,以评估它们的相对重要性并描述预测模型的行为。在这个模型中,在高 mRNA 值和稀疏数据的区域出现了强烈的平台效应。因此,我们根据从捕获到这种行为的部分依赖关系图中估计的阈值,从这些区域中删除基因。在这一阶段,只保留了对蛋白质丰度最强的预测因子,以降低 GBT 模型的复杂性。在去除平台区域的基因后,mRNA 丰度、主要细胞功能类别和少数三密码子计数成为蛋白质丰度的最高预测因子。然后,我们使用五个最重要的预测因子创建了一个新的调整后的 GBT 模型。我们的非线性模型的构建由一组串行回归树模型组成,这些模型具有隐含的变量选择能力。该模型使用均方误差作为标准提供变量相对重要性度量。结果表明,在两个数据集,我们的非线性模型的决定系数范围从 0.393 到 0.582,提供了比过去使用的线性回归更好的结果。我们使用操纵子、调节子和途径的生物学信息来评估这个非线性模型的有效性,结果表明,操纵子、调节子或途径内估计蛋白质丰度值的变异系数确实小于随机蛋白质组的变异系数。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins.脱硫弧菌转录组和蛋白质组数据的综合分析:预测未检测到蛋白质丰度的非线性模型。
Bioinformatics. 2009 Aug 1;25(15):1905-14. doi: 10.1093/bioinformatics/btp325. Epub 2009 May 15.
2
Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets.嗜温栖热放线菌转录组学和蛋白质组学数据的综合分析:利用时间数据集进行缺失值插补
Mol Biosyst. 2011 Apr;7(4):1093-104. doi: 10.1039/c0mb00260g. Epub 2011 Jan 7.
3
Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: zero-inflated Poisson regression models to predict abundance of undetected proteins.普通脱硫弧菌转录组和蛋白质组数据的综合分析:用于预测未检测到的蛋白质丰度的零膨胀泊松回归模型
Bioinformatics. 2006 Jul 1;22(13):1641-7. doi: 10.1093/bioinformatics/btl134. Epub 2006 May 4.
4
Integrated Analysis of Transcriptomic and Proteomic Datasets Reveals Information on Protein Expressivity and Factors Affecting Translational Efficiency.转录组学和蛋白质组学数据集的综合分析揭示了蛋白质表达信息以及影响翻译效率的因素。
Methods Mol Biol. 2016;1375:123-36. doi: 10.1007/7651_2015_242.
5
Prediction and Characterization of Missing Proteomic Data in Desulfovibrio vulgaris.普通脱硫弧菌中缺失蛋白质组数据的预测与表征
Comp Funct Genomics. 2011;2011:780973. doi: 10.1155/2011/780973. Epub 2011 May 4.
6
Correlation between mRNA and protein abundance in Desulfovibrio vulgaris: a multiple regression to identify sources of variations.普通脱硫弧菌中mRNA与蛋白质丰度之间的相关性:用于识别变异来源的多元回归分析
Biochem Biophys Res Commun. 2006 Jan 13;339(2):603-10. doi: 10.1016/j.bbrc.2005.11.055. Epub 2005 Nov 17.
7
Correlation of mRNA expression and protein abundance affected by multiple sequence features related to translational efficiency in Desulfovibrio vulgaris: a quantitative analysis.普通脱硫弧菌中与翻译效率相关的多个序列特征对mRNA表达与蛋白质丰度的影响:定量分析
Genetics. 2006 Dec;174(4):2229-43. doi: 10.1534/genetics.106.065862. Epub 2006 Oct 8.
8
LC-MS/MS based proteomic analysis and functional inference of hypothetical proteins in Desulfovibrio vulgaris.基于液相色谱-串联质谱的普通脱硫弧菌中假定蛋白质的蛋白质组学分析及功能推断
Biochem Biophys Res Commun. 2006 Nov 3;349(4):1412-9. doi: 10.1016/j.bbrc.2006.09.019. Epub 2006 Sep 15.
9
Transcriptomic and proteomic analyses of Desulfovibrio vulgaris biofilms: carbon and energy flow contribute to the distinct biofilm growth state.脱硫弧菌生物膜的转录组学和蛋白质组学分析:碳和能量流有助于形成独特的生物膜生长状态。
BMC Genomics. 2012 Apr 16;13:138. doi: 10.1186/1471-2164-13-138.
10
A proteomic view of Desulfovibrio vulgaris metabolism as determined by liquid chromatography coupled with tandem mass spectrometry.通过液相色谱-串联质谱法测定的普通脱硫弧菌代谢的蛋白质组学视角。
Proteomics. 2006 Aug;6(15):4286-99. doi: 10.1002/pmic.200500930.

引用本文的文献

1
Workability of mRNA Sequencing for Predicting Protein Abundance.mRNA 测序预测蛋白质丰度的可行性。
Genes (Basel). 2023 Nov 11;14(11):2065. doi: 10.3390/genes14112065.
2
Robust Score Tests With Missing Data in Genomics Studies.基因组学研究中存在缺失数据时的稳健得分检验。
J Am Stat Assoc. 2019;114(528):1778-1786. doi: 10.1080/01621459.2018.1514304. Epub 2019 Feb 26.
3
Machine Learning and Integrative Analysis of Biomedical Big Data.机器学习与生物医学大数据的综合分析。
Genes (Basel). 2019 Jan 28;10(2):87. doi: 10.3390/genes10020087.
4
Proteomics and phosphoproteomics in precision medicine: applications and challenges.精准医学中的蛋白质组学和磷酸化蛋白质组学:应用与挑战。
Brief Bioinform. 2019 May 21;20(3):767-777. doi: 10.1093/bib/bbx141.
5
Identifying Aspects of the Post-Transcriptional Program Governing the Proteome of the Green Alga Micromonas pusilla.确定控制绿藻微小原甲藻蛋白质组的转录后程序的各个方面。
PLoS One. 2016 Jul 19;11(7):e0155839. doi: 10.1371/journal.pone.0155839. eCollection 2016.
6
An integrative imputation method based on multi-omics datasets.一种基于多组学数据集的综合插补方法。
BMC Bioinformatics. 2016 Jun 21;17:247. doi: 10.1186/s12859-016-1122-6.
7
Genetic basis for nitrate resistance in Desulfovibrio strains.硝酸盐抗性在脱硫弧菌菌株中的遗传基础。
Front Microbiol. 2014 Apr 21;5:153. doi: 10.3389/fmicb.2014.00153. eCollection 2014.
8
Predicting the dynamics of protein abundance.预测蛋白质丰度的动态变化。
Mol Cell Proteomics. 2014 May;13(5):1330-40. doi: 10.1074/mcp.M113.033076. Epub 2014 Feb 16.
9
Multi-omic network signatures of disease.疾病的多组学网络特征。
Front Genet. 2014 Jan 7;4:309. doi: 10.3389/fgene.2013.00309.
10
Integrated analysis of transcriptomic and proteomic data.转录组和蛋白质组数据的综合分析。
Curr Genomics. 2013 Apr;14(2):91-110. doi: 10.2174/1389202911314020003.

本文引用的文献

1
A working guide to boosted regression trees.提升回归树实用指南。
J Anim Ecol. 2008 Jul;77(4):802-13. doi: 10.1111/j.1365-2656.2008.01390.x. Epub 2008 Apr 8.
2
Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications.转录组学和蛋白质组学数据的综合分析:挑战、解决方案与应用
Crit Rev Biotechnol. 2007 Apr-Jun;27(2):63-75. doi: 10.1080/07388550701334212.
3
Boosted trees for ecological modeling and prediction.用于生态建模与预测的提升树。
Ecology. 2007 Jan;88(1):243-51. doi: 10.1890/0012-9658(2007)88[243:btfema]2.0.co;2.
4
Exploring glycopeptide-resistance in Staphylococcus aureus: a combined proteomics and transcriptomics approach for the identification of resistance-related markers.探索金黄色葡萄球菌中的糖肽抗性:一种用于鉴定抗性相关标志物的蛋白质组学与转录组学联合方法
BMC Genomics. 2006 Nov 22;7:296. doi: 10.1186/1471-2164-7-296.
5
Correlation of mRNA expression and protein abundance affected by multiple sequence features related to translational efficiency in Desulfovibrio vulgaris: a quantitative analysis.普通脱硫弧菌中与翻译效率相关的多个序列特征对mRNA表达与蛋白质丰度的影响:定量分析
Genetics. 2006 Dec;174(4):2229-43. doi: 10.1534/genetics.106.065862. Epub 2006 Oct 8.
6
A proteomic view of Desulfovibrio vulgaris metabolism as determined by liquid chromatography coupled with tandem mass spectrometry.通过液相色谱-串联质谱法测定的普通脱硫弧菌代谢的蛋白质组学视角。
Proteomics. 2006 Aug;6(15):4286-99. doi: 10.1002/pmic.200500930.
7
Global transcriptomic analysis of Desulfovibrio vulgaris on different electron donors.不同电子供体条件下普通脱硫弧菌的全转录组分析
Antonie Van Leeuwenhoek. 2006 Feb;89(2):221-37. doi: 10.1007/s10482-005-9024-z. Epub 2006 May 5.
8
Salt stress in Desulfovibrio vulgaris Hildenborough: an integrated genomics approach.希登伯勒脱硫弧菌中的盐胁迫:一种综合基因组学方法。
J Bacteriol. 2006 Jun;188(11):4068-78. doi: 10.1128/JB.01921-05.
9
Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: zero-inflated Poisson regression models to predict abundance of undetected proteins.普通脱硫弧菌转录组和蛋白质组数据的综合分析:用于预测未检测到的蛋白质丰度的零膨胀泊松回归模型
Bioinformatics. 2006 Jul 1;22(13):1641-7. doi: 10.1093/bioinformatics/btl134. Epub 2006 May 4.
10
OpWise: operons aid the identification of differentially expressed genes in bacterial microarray experiments.OpWise:操纵子有助于在细菌微阵列实验中识别差异表达基因。
BMC Bioinformatics. 2006 Jan 13;7:19. doi: 10.1186/1471-2105-7-19.