Suppr超能文献

转录组和蛋白质组测量之间的差距弥合确定了翻译后调节的基因。

Bridging the gap between transcriptome and proteome measurements identifies post-translationally regulated genes.

机构信息

School of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK.

出版信息

Bioinformatics. 2013 Dec 1;29(23):3060-6. doi: 10.1093/bioinformatics/btt537. Epub 2013 Sep 16.

Abstract

MOTIVATION

Despite much dynamical cellular behaviour being achieved by accurate regulation of protein concentrations, messenger RNA abundances, measured by microarray technology, and more recently by deep sequencing techniques, are widely used as proxies for protein measurements. Although for some species and under some conditions, there is good correlation between transcriptome and proteome level measurements, such correlation is by no means universal due to post-transcriptional and post-translational regulation, both of which are highly prevalent in cells. Here, we seek to develop a data-driven machine learning approach to bridging the gap between these two levels of high-throughput omic measurements on Saccharomyces cerevisiae and deploy the model in a novel way to uncover mRNA-protein pairs that are candidates for post-translational regulation.

RESULTS

The application of feature selection by sparsity inducing regression (l₁ norm regularization) leads to a stable set of features: i.e. mRNA, ribosomal occupancy, ribosome density, tRNA adaptation index and codon bias while achieving a feature reduction from 37 to 5. A linear predictor used with these features is capable of predicting protein concentrations fairly accurately (R² = 0.86). Proteins whose concentration cannot be predicted accurately, taken as outliers with respect to the predictor, are shown to have annotation evidence of post-translational modification, significantly more than random subsets of similar size P < 0.02. In a data mining sense, this work also shows a wider point that outliers with respect to a learning method can carry meaningful information about a problem domain.

摘要

动机

尽管通过精确调节蛋白质浓度、通过微阵列技术和最近的深度测序技术测量的信使 RNA 丰度,可以实现许多动态细胞行为,但 RNA 丰度仍被广泛用作蛋白质测量的替代物。尽管对于某些物种和某些条件下,转录组和蛋白质组水平的测量之间存在很好的相关性,但由于转录后和翻译后调节,这种相关性并非普遍存在,细胞中这两种调节方式非常普遍。在这里,我们寻求开发一种数据驱动的机器学习方法来弥合这两种高通量组学测量之间的差距,应用于酿酒酵母,并以一种新的方式揭示候选翻译后调节的 mRNA-蛋白质对。

结果

稀疏回归(l₁ 范数正则化)的特征选择应用导致了一组稳定的特征:即 mRNA、核糖体占有率、核糖体密度、tRNA 适应指数和密码子偏性,同时将特征数量从 37 个减少到 5 个。使用这些特征的线性预测器能够相当准确地预测蛋白质浓度(R² = 0.86)。不能准确预测蛋白质浓度的蛋白质,被视为预测器的异常值,与类似大小的随机子集相比,具有翻译后修饰的注释证据,差异显著(P < 0.02)。从数据挖掘的角度来看,这项工作还表明,对于学习方法的异常值可以携带有关问题领域的有意义信息。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验