Suppr超能文献

基于机器学习的跨物种增强子预测。

Cross-species enhancer prediction using machine learning.

机构信息

The Davies Livestock Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, SA 5371, Australia.

BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, UNSW, Sydney, NSW 2052, Australia; School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia.

出版信息

Genomics. 2022 Sep;114(5):110454. doi: 10.1016/j.ygeno.2022.110454. Epub 2022 Aug 25.

Abstract

Cis-regulatory elements (CREs) are non-coding parts of the genome that play a critical role in gene expression regulation. Enhancers, as an important example of CREs, interact with genes to influence complex traits like disease, heat tolerance and growth rate. Much of what is known about enhancers come from studies of humans and a few model organisms like mouse, with little known about other mammalian species. Previous studies have attempted to identify enhancers in less studied mammals using comparative genomics but with limited success. Recently, Machine Learning (ML) techniques have shown promising results to predict enhancer regions. Here, we investigated the ability of ML methods to identify enhancers in three non-model mammalian species (cattle, pig and dog) using human and mouse enhancer data from VISTA and publicly available ChIP-seq. We tested nine models, using four different representations of the DNA sequences in cross-species prediction using both the VISTA dataset and species-specific ChIP-seq data. We identified between 809,399 and 877,278 enhancer-like regions (ELRs) in the study species (11.6-13.7% of each genome). These predictions were close to the ~8% proportion of ELRs that covered the human genome. We propose that our ML methods have predictive ability for identifying enhancers in non-model mammalian species. We have provided a list of high confidence enhancers at https://github.com/DaviesCentreInformatics/Cross-species-enhancer-prediction and believe these enhancers will be of great use to the community.

摘要

顺式调控元件 (CREs) 是基因组的非编码部分,在基因表达调控中起着关键作用。增强子作为 CREs 的一个重要例子,与基因相互作用,影响疾病、耐热性和生长速度等复杂性状。我们对增强子的了解大多来自人类和少数模式生物(如老鼠)的研究,而对其他哺乳动物物种知之甚少。先前的研究试图利用比较基因组学在研究较少的哺乳动物中识别增强子,但收效甚微。最近,机器学习 (ML) 技术已显示出在使用人类和小鼠增强子数据的情况下,在三种非模式哺乳动物物种(牛、猪和狗)中识别增强子的有前途的结果。我们使用来自 VISTA 的人类和小鼠增强子数据以及公开的 ChIP-seq,调查了 ML 方法在三种非模式哺乳动物物种(牛、猪和狗)中识别增强子的能力。我们使用四种不同的 DNA 序列表示方法在跨物种预测中测试了九个模型,使用了来自 VISTA 数据集和物种特异性 ChIP-seq 数据。我们在研究物种中鉴定了 809399 到 877278 个增强子样区域 (ELRs)(每个基因组的 11.6-13.7%)。这些预测与覆盖人类基因组的 ELR 约 8%的比例相近。我们提出,我们的 ML 方法具有在非模式哺乳动物物种中识别增强子的预测能力。我们在 https://github.com/DaviesCentreInformatics/Cross-species-enhancer-prediction 上提供了一份高可信度增强子列表,我们相信这些增强子将对社区有很大的用处。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验