跨物种调控序列活性预测。

Cross-species regulatory sequence activity prediction.

机构信息

Calico Life Sciences, South San Francisco, California, United States of America.

出版信息

PLoS Comput Biol. 2020 Jul 20;16(7):e1008050. doi: 10.1371/journal.pcbi.1008050. eCollection 2020 Jul.

DOI:10.1371/journal.pcbi.1008050

PMID:32687525

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7392335/

Abstract

Machine learning algorithms trained to predict the regulatory activity of nucleic acid sequences have revealed principles of gene regulation and guided genetic variation analysis. While the human genome has been extensively annotated and studied, model organisms have been less explored. Model organism genomes offer both additional training sequences and unique annotations describing tissue and cell states unavailable in humans. Here, we develop a strategy to train deep convolutional neural networks simultaneously on multiple genomes and apply it to learn sequence predictors for large compendia of human and mouse data. Training on both genomes improves gene expression prediction accuracy on held out and variant sequences. We further demonstrate a novel and powerful approach to apply mouse regulatory models to analyze human genetic variants associated with molecular phenotypes and disease. Together these techniques unleash thousands of non-human epigenetic and transcriptional profiles toward more effective investigation of how gene regulation affects human disease.

摘要

机器学习算法经过训练，可以预测核酸序列的调控活性，从而揭示基因调控的原理，并指导遗传变异分析。虽然人类基因组已经得到了广泛的注释和研究，但对模式生物的研究却相对较少。模式生物的基因组不仅提供了更多的训练序列，还提供了独特的注释，描述了人类所没有的组织和细胞状态。在这里，我们开发了一种在多个基因组上同时训练深度卷积神经网络的策略，并将其应用于学习大型人类和小鼠数据集的序列预测器。在两个基因组上进行训练可以提高对保留和变异序列的基因表达预测准确性。我们进一步展示了一种新颖而强大的方法，将小鼠调控模型应用于分析与分子表型和疾病相关的人类遗传变异。这些技术共同释放了数千种非人类的表观遗传和转录谱，以更有效地研究基因调控如何影响人类疾病。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f29/7392335/d7361ad342d1/pcbi.1008050.g001.jpg

相似文献

Cross-species regulatory sequence activity prediction.

PLoS Comput Biol. 2020 Jul 20;16(7):e1008050. doi: 10.1371/journal.pcbi.1008050. eCollection 2020 Jul.

Sequential regulatory activity prediction across chromosomes with convolutional neural networks.

Genome Res. 2018 May;28(5):739-750. doi: 10.1101/gr.227819.117. Epub 2018 Mar 27.

Machine learning random forest for predicting oncosomatic variant NGS analysis.

Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.

Decoding disease: from genomes to networks to phenotypes.

Nat Rev Genet. 2021 Dec;22(12):774-790. doi: 10.1038/s41576-021-00389-x. Epub 2021 Aug 2.

Quantitative prediction of enhancer-promoter interactions.

Genome Res. 2020 Jan;30(1):72-84. doi: 10.1101/gr.249367.119. Epub 2019 Dec 2.

Predicting effects of noncoding variants with deep learning-based sequence model.

Nat Methods. 2015 Oct;12(10):931-4. doi: 10.1038/nmeth.3547. Epub 2015 Aug 24.

Cross-organism learning method to discover new gene functionalities.

Comput Methods Programs Biomed. 2016 Apr;126:20-34. doi: 10.1016/j.cmpb.2015.12.002. Epub 2015 Dec 17.

Predicting enhancers with deep convolutional neural networks.

BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):478. doi: 10.1186/s12859-017-1878-3.

Selene: a PyTorch-based deep learning library for sequence data.

Nat Methods. 2019 Apr;16(4):315-318. doi: 10.1038/s41592-019-0360-8. Epub 2019 Mar 28.

WEVar: a novel statistical learning framework for predicting noncoding regulatory variants.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab189.

引用本文的文献

Pretraining Improves Prediction of Genomic Datasets Across Species.

bioRxiv. 2025 Aug 24:2025.08.20.671362. doi: 10.1101/2025.08.20.671362.

De novo prediction of functional effects of genetic variants from DNA sequences based on context-specific molecular information.

Front Syst Biol. 2024 Jun 3;4:1402664. doi: 10.3389/fsysb.2024.1402664. eCollection 2024.

Epiregulon: Single-cell transcription factor activity inference to predict drug response and drivers of cell states.

Nat Commun. 2025 Aug 2;16(1):7118. doi: 10.1038/s41467-025-62252-5.

Unsupervised evaluation of pre-trained DNA language model embeddings.

BMC Genomics. 2025 Aug 1;26(1):710. doi: 10.1186/s12864-025-11913-2.

In silico prediction of variant effects: promises and limitations for precision plant breeding.

Theor Appl Genet. 2025 Jul 28;138(8):193. doi: 10.1007/s00122-025-04973-1.

Combining Machine Learning and Multiplexed, Profiling to Engineer Cell Type and Behavioral Specificity.

bioRxiv. 2025 Jun 21:2025.06.20.660790. doi: 10.1101/2025.06.20.660790.

Perspective on recent developments and challenges in regulatory and systems genomics.

Bioinform Adv. 2025 May 9;5(1):vbaf106. doi: 10.1093/bioadv/vbaf106. eCollection 2025.

"Frustratingly easy" domain adaptation for cross-species transcription factor binding prediction.

bioRxiv. 2025 May 26:2025.05.21.655414. doi: 10.1101/2025.05.21.655414.

Developing a general AI model for integrating diverse genomic modalities and comprehensive genomic knowledge.

bioRxiv. 2025 May 14:2025.05.08.652986. doi: 10.1101/2025.05.08.652986.

Predicting gene expression from DNA sequence using deep learning models.

Nat Rev Genet. 2025 May 13. doi: 10.1038/s41576-025-00841-2.

本文引用的文献

A simple new approach to variable selection in regression, with application to genetic fine mapping.

J R Stat Soc Series B Stat Methodol. 2020 Dec;82(5):1273-1300. doi: 10.1111/rssb.12388. Epub 2020 Jul 10.

Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease.

Nat Commun. 2020 Dec 7;11(1):6258. doi: 10.1038/s41467-020-20087-2.

Predicting 3D genome folding from DNA sequence with Akita.

Nat Methods. 2020 Nov;17(11):1111-1117. doi: 10.1038/s41592-020-0958-x. Epub 2020 Oct 12.

The mutational constraint spectrum quantified from variation in 141,456 humans.

Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.

Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks.

Cell Rep. 2020 May 19;31(7):107663. doi: 10.1016/j.celrep.2020.107663.

Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network.

Int J Mol Sci. 2019 Jul 12;20(14):3425. doi: 10.3390/ijms20143425.

Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk.

Nat Genet. 2019 Jun;51(6):973-980. doi: 10.1038/s41588-019-0420-0. Epub 2019 May 27.

Predicting Splicing from Primary Sequence with Deep Learning.

Cell. 2019 Jan 24;176(3):535-548.e24. doi: 10.1016/j.cell.2018.12.015. Epub 2019 Jan 17.

Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder.

Science. 2018 Dec 14;362(6420). doi: 10.1126/science.aat6576.

Nuclear receptor HNF4A transrepresses CLOCK:BMAL1 and modulates tissue-specific circadian networks.

Proc Natl Acad Sci U S A. 2018 Dec 26;115(52):E12305-E12312. doi: 10.1073/pnas.1816411115. Epub 2018 Dec 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

跨物种调控序列活性预测。

Cross-species regulatory sequence activity prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献