Suppr超能文献

在胚胎发育过程中准确预测全基因组时空基因表达。

Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development.

机构信息

Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America.

Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey, United States of America.

出版信息

PLoS Genet. 2019 Sep 25;15(9):e1008382. doi: 10.1371/journal.pgen.1008382. eCollection 2019 Sep.

Abstract

Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.

摘要

全面的基因表达时间和位置信息对于我们理解胚胎发育和组织形成至关重要。虽然高通量原位杂交项目为果蝇等模式生物的发育基因表达模式提供了宝贵的信息,但这些实验的输出主要是定性的,并且很大一部分蛋白质编码基因和大多数非编码基因缺乏任何注释。因此,准确的数据中心预测时空基因表达将补充当前的原位杂交工作。在这里,我们通过在所有公共基因表达和染色质数据上训练模型(甚至来自整个生物体的实验)应用机器学习方法,为所有基因提供全基因组、定量的时空预测。我们开发了结构化的计算机辅助纳米切割,这是一种计算方法,可以预测 >200 个组织发育阶段的基因表达。该算法以细胞谱系感知的方式整合了来自 6378 个全基因组表达和染色质作图实验的表达信号。我们通过交叉验证系统地评估了我们的性能,并通过实验验证了四个不同胚胎组织的 22 个新预测。该模型还可以高精度预测复杂的多组织表达和发育调控。我们进一步展示了将这些全基因组预测应用于从非组织切割实验中提取组织特异性信号以及对疾病建模进行组织和阶段优先级排序的潜力。该资源与探索性工具一起免费提供在我们的服务器 http://find.princeton.edu 上,这为一系列应用提供了有价值的工具,从预测时空表达模式到从差异基因表达谱中识别组织特征。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验