Suppr超能文献

利用卷积神经网络处理双链基因组序列信息来预测基因调控区域。

Predicting gene regulatory regions with a convolutional neural network for processing double-strand genome sequence information.

机构信息

Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Chuo-ku, Kobe, Hyogo, Japan.

出版信息

PLoS One. 2020 Jul 23;15(7):e0235748. doi: 10.1371/journal.pone.0235748. eCollection 2020.

Abstract

With advances in sequencing technology, a vast amount of genomic sequence information has become available. However, annotating biological functions particularly of non-protein-coding regions in genome sequences without experiments is still a challenging task. Recently deep learning-based methods were shown to have the ability to predict gene regulatory regions from genome sequences, promising to aid the interpretation of genomic sequence data. Here, we report an improvement of the prediction accuracy for gene regulatory regions by using the design of convolution layers that efficiently process genomic sequence information, and developed a software, DeepGMAP, to train and compare different deep learning-based models (https://github.com/koonimaru/DeepGMAP). First, we demonstrate that our convolution layers, termed forward- and reverse-sequence scan (FRSS) layers, integrate both forward and reverse strand information, and enhance the power to predict gene regulatory regions. Second, we assessed previous studies and identified problems associated with data structures that caused overfitting. Finally, we introduce visualization methods to examine what the program learned. Together, our FRSS layers improve the prediction accuracy for gene regulatory regions.

摘要

随着测序技术的进步,大量的基因组序列信息已经变得可用。然而,在没有实验的情况下注释生物功能,特别是基因组序列中非蛋白编码区域的生物功能,仍然是一项具有挑战性的任务。最近,基于深度学习的方法被证明有能力从基因组序列中预测基因调控区域,有望帮助解释基因组序列数据。在这里,我们报告了通过使用卷积层的设计来提高基因调控区域预测精度的改进,卷积层可以有效地处理基因组序列信息,并开发了一个软件 DeepGMAP,用于训练和比较不同的基于深度学习的模型(https://github.com/koonimaru/DeepGMAP)。首先,我们证明了我们的卷积层,称为正向和反向序列扫描(FRSS)层,整合了正向和反向链信息,并增强了预测基因调控区域的能力。其次,我们评估了以前的研究,并确定了与导致过拟合的数据结构相关的问题。最后,我们引入了可视化方法来检查程序学习了什么。总之,我们的 FRSS 层提高了基因调控区域的预测精度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f671/7377372/68447ed6aba9/pone.0235748.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验