Suppr超能文献

多分支卷积神经网络用于鉴定小型非编码 RNA 基因组基因座。

Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci.

机构信息

Central European Institute of Technology, Brno, Czech Republic.

Department of Electrical and Computer Engineering, School of Engineering, University of Thessaly, Volos, Greece.

出版信息

Sci Rep. 2020 Jun 11;10(1):9486. doi: 10.1038/s41598-020-66454-3.

Abstract

Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard.

摘要

编码小 RNA 基因的基因组区域在其序列、二级结构和进化保守性方面表现出特征模式。卷积神经网络是一类可以根据学习到的模式对数据进行分类的算法。在这里,我们提出了 MuStARD,这是一种卷积神经网络的应用,它可以学习与用户定义的基因组区域集相关的模式,并扫描大片基因组区域以寻找具有类似特征的新区域。我们证明 MuStARD 是一种通用方法,可以在不同类别的人类小 RNA 基因组基因座上进行训练,而无需特定于领域的知识,这要归功于模型中内置的自动化特征和背景选择过程。我们还通过使用在人类基因组上训练的模型来预测小鼠小 RNA(pre-miRNA 和 snoRNA),证明了 MuStARD 用于功能元件的种间识别的能力。MuStARD 可用于过滤小 RNA-Seq 数据集,以识别新型小 RNA 基因座,包括种内和种间,如在人类、小鼠和果蝇 pre-miRNA 预测的三个用例中所示。MuStARD 易于部署和扩展到各种基因组分类问题。代码和训练模型可在 gitlab.com/RBP_Bioinformatics/mustard 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8cf/7289789/141907a37ddc/41598_2020_66454_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验