Suppr超能文献

从 DNA 序列预测相对转录丰度的进化信息深度学习方法。

Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence.

机构信息

Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853.

Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853;

出版信息

Proc Natl Acad Sci U S A. 2019 Mar 19;116(12):5542-5549. doi: 10.1073/pnas.1814551116. Epub 2019 Mar 6.

Abstract

Deep learning methodologies have revolutionized prediction in many fields and show potential to do the same in molecular biology and genetics. However, applying these methods in their current forms ignores evolutionary dependencies within biological systems and can result in false positives and spurious conclusions. We developed two approaches that account for evolutionary relatedness in machine learning models: () gene-family-guided splitting and () ortholog contrasts. The first approach accounts for evolution by constraining model training and testing sets to include different gene families. The second approach uses evolutionarily informed comparisons between orthologous genes to both control for and leverage evolutionary divergence during the training process. The two approaches were explored and validated within the context of mRNA expression level prediction and have the area under the ROC curve (auROC) values ranging from 0.75 to 0.94. Model weight inspections showed biologically interpretable patterns, resulting in the hypothesis that the 3' UTR is more important for fine-tuning mRNA abundance levels while the 5' UTR is more important for large-scale changes.

摘要

深度学习方法已经彻底改变了许多领域的预测,并显示出在分子生物学和遗传学中也具有同样的潜力。然而,将这些方法应用于其当前形式忽略了生物系统内的进化依赖性,可能导致假阳性和错误的结论。我们开发了两种方法来解决机器学习模型中的进化相关性问题:(1)基因家族指导分割和(2)直系同源物对比。第一种方法通过限制模型训练和测试集来包含不同的基因家族,从而考虑进化。第二种方法使用进化信息在直系同源基因之间进行比较,既可以在训练过程中控制进化分歧,又可以利用进化分歧。这两种方法在 mRNA 表达水平预测的背景下进行了探索和验证,ROC 曲线下面积(auROC)值范围从 0.75 到 0.94。模型权重检查显示出具有生物学可解释性的模式,这导致了一个假设,即 3'UTR 对微调 mRNA 丰度水平更为重要,而 5'UTR 对大规模变化更为重要。

相似文献

4
DeepHE: Accurately predicting human essential genes based on deep learning.DeepHE:基于深度学习的人类必需基因精准预测。
PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.
6
Context specific transcription factor prediction.上下文特异性转录因子预测
Ann Biomed Eng. 2007 Jun;35(6):1053-67. doi: 10.1007/s10439-007-9268-z. Epub 2007 Mar 22.

引用本文的文献

3
Discriminating models of trait evolution.性状进化的判别模型。
bioRxiv. 2025 Jun 13:2025.06.12.659377. doi: 10.1101/2025.06.12.659377.

本文引用的文献

1
Deep learning in biomedicine.深度学习在生物医学中的应用。
Nat Biotechnol. 2018 Oct;36(9):829-838. doi: 10.1038/nbt.4233. Epub 2018 Sep 6.
3
Identification of cis-regulatory elements by chromatin structure.通过染色质结构鉴定顺式调控元件。
Curr Opin Plant Biol. 2018 Apr;42:90-94. doi: 10.1016/j.pbi.2018.04.004. Epub 2018 Apr 25.
5
Deep learning for biology.用于生物学的深度学习
Nature. 2018 Feb 22;554(7693):555-557. doi: 10.1038/d41586-018-02174-z.
7
Determining mRNA half-lives on a transcriptome-wide scale.在转录组范围内测定 mRNA 半衰期。
Methods. 2018 Mar 15;137:90-98. doi: 10.1016/j.ymeth.2017.12.006. Epub 2017 Dec 13.
8
UTR-Dependent Control of Gene Expression in Plants.UTR 依赖的植物基因表达调控。
Trends Plant Sci. 2018 Mar;23(3):248-259. doi: 10.1016/j.tplants.2017.11.003. Epub 2017 Dec 6.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验