• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CNN-MGP:用于宏基因组基因预测的卷积神经网络。

CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction.

机构信息

Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

出版信息

Interdiscip Sci. 2019 Dec;11(4):628-635. doi: 10.1007/s12539-018-0313-4. Epub 2018 Dec 27.

DOI:10.1007/s12539-018-0313-4
PMID:30588558
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6841655/
Abstract

Accurate gene prediction in metagenomics fragments is a computationally challenging task due to the short-read length, incomplete, and fragmented nature of the data. Most gene-prediction programs are based on extracting a large number of features and then applying statistical approaches or supervised classification approaches to predict genes. In our study, we introduce a convolutional neural network for metagenomics gene prediction (CNN-MGP) program that predicts genes in metagenomics fragments directly from raw DNA sequences, without the need for manual feature extraction and feature selection stages. CNN-MGP is able to learn the characteristics of coding and non-coding regions and distinguish coding and non-coding open reading frames (ORFs). We train 10 CNN models on 10 mutually exclusive datasets based on pre-defined GC content ranges. We extract ORFs from each fragment; then, the ORFs are encoded numerically and inputted into an appropriate CNN model based on the fragment-GC content. The output from the CNN is the probability that an ORF will encode a gene. Finally, a greedy algorithm is used to select the final gene list. Overall, CNN-MGP is effective and achieves a 91% accuracy on testing dataset. CNN-MGP shows the ability of deep learning to predict genes in metagenomics fragments, and it achieves an accuracy higher than or comparable to state-of-the-art gene-prediction programs that use pre-defined features.

摘要

在宏基因组片段中进行准确的基因预测是一项具有挑战性的计算任务,这是由于数据的短读长、不完整和碎片化性质。大多数基因预测程序都是基于提取大量特征,然后应用统计方法或监督分类方法来预测基因。在我们的研究中,我们引入了一种用于宏基因组基因预测的卷积神经网络(CNN-MGP)程序,该程序可以直接从原始 DNA 序列中预测宏基因组片段中的基因,而无需进行手动特征提取和特征选择阶段。CNN-MGP 能够学习编码和非编码区域的特征,并区分编码和非编码开放阅读框(ORF)。我们在基于预定义 GC 含量范围的 10 个互斥数据集中训练了 10 个 CNN 模型。我们从每个片段中提取 ORF;然后,将 ORF 数值编码,并根据片段的 GC 含量输入到适当的 CNN 模型中。CNN 的输出是 ORF 编码基因的概率。最后,使用贪心算法选择最终的基因列表。总的来说,CNN-MGP 是有效的,在测试数据集上达到了 91%的准确率。CNN-MGP 展示了深度学习在宏基因组片段中预测基因的能力,并且它的准确率高于或可与使用预定义特征的最先进的基因预测程序相媲美。

相似文献

1
CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction.CNN-MGP:用于宏基因组基因预测的卷积神经网络。
Interdiscip Sci. 2019 Dec;11(4):628-635. doi: 10.1007/s12539-018-0313-4. Epub 2018 Dec 27.
2
MGC: a metagenomic gene caller.MGC:一种宏基因组基因调用器。
BMC Bioinformatics. 2013;14 Suppl 9(Suppl 9):S6. doi: 10.1186/1471-2105-14-S9-S6. Epub 2013 Jun 28.
3
deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.深度 NEC:一种新颖的无对齐工具,用于使用深度学习识别和分类与氮生化网络相关的酶。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac071.
4
Feature selection for gene prediction in metagenomic fragments.宏基因组片段中基因预测的特征选择
BioData Min. 2018 Jun 7;11:9. doi: 10.1186/s13040-018-0170-z. eCollection 2018.
5
Phylogenetic convolutional neural networks in metagenomics.元基因组学中的系统发生卷积神经网络。
BMC Bioinformatics. 2018 Mar 8;19(Suppl 2):49. doi: 10.1186/s12859-018-2033-5.
6
CNN-Siam: multimodal siamese CNN-based deep learning approach for drug‒drug interaction prediction.CNN-Siam:基于双通道 CNN 的深度学习方法用于药物-药物相互作用预测。
BMC Bioinformatics. 2023 Mar 23;24(1):110. doi: 10.1186/s12859-023-05242-y.
7
Convolutional neural network models for cancer type prediction based on gene expression.基于基因表达的癌症类型预测卷积神经网络模型。
BMC Med Genomics. 2020 Apr 3;13(Suppl 5):44. doi: 10.1186/s12920-020-0677-2.
8
Gene prediction in metagenomic fragments: a large scale machine learning approach.宏基因组片段中的基因预测:一种大规模机器学习方法。
BMC Bioinformatics. 2008 Apr 28;9:217. doi: 10.1186/1471-2105-9-217.
9
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.GeneMarkS:一种用于预测微生物基因组中基因起始位点的自训练方法。对在调控区域中寻找序列基序的启示。
Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607.
10
CNN-BLPred: a Convolutional neural network based predictor for β-Lactamases (BL) and their classes.CNN-BLPred:一种基于卷积神经网络的β-内酰胺酶(BL)及其分类预测器。
BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):577. doi: 10.1186/s12859-017-1972-6.

引用本文的文献

1
Analysis of metagenomic data.宏基因组数据的分析
Nat Rev Methods Primers. 2025;5. doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23.
2
Genomic language models (gLMs) decode bacterial genomes for improved gene prediction and translation initiation site identification.基因组语言模型(gLMs)对细菌基因组进行解码,以改进基因预测和翻译起始位点识别。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf311.
3
FGeneBERT: function-driven pre-trained gene language model for metagenomics.FGeneBERT:用于宏基因组学的功能驱动型预训练基因语言模型

本文引用的文献

1
Computational biology: deep learning.计算生物学:深度学习
Emerg Top Life Sci. 2017 Nov 14;1(3):257-274. doi: 10.1042/ETLS20160025.
2
Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.使用卷积深度学习神经网络识别原核生物和真核生物启动子。
PLoS One. 2017 Feb 3;12(2):e0171410. doi: 10.1371/journal.pone.0171410. eCollection 2017.
3
GenBank.基因银行
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf149.
4
A review of neural networks for metagenomic binning.宏基因组分箱的神经网络综述。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.
5
Deep learning in microbiome analysis: a comprehensive review of neural network models.微生物组分析中的深度学习:神经网络模型综述
Front Microbiol. 2025 Jan 22;15:1516667. doi: 10.3389/fmicb.2024.1516667. eCollection 2024.
6
Design of Interoperable Electronic Health Record (EHR) Application for Early Detection of Lung Diseases Using a Decision Support System by Expanding Deep Learning Techniques.通过扩展深度学习技术,设计用于使用决策支持系统早期检测肺部疾病的可互操作电子健康记录(EHR)应用程序。
Open Respir Med J. 2024 Jun 6;18:e18743064296470. doi: 10.2174/0118743064296470240520075316. eCollection 2024.
7
Deep learning methods in metagenomics: a review.元基因组学中的深度学习方法:综述。
Microb Genom. 2024 Apr;10(4). doi: 10.1099/mgen.0.001231.
8
A toolbox of machine learning software to support microbiome analysis.一个支持微生物组分析的机器学习软件工具箱。
Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023.
9
Artificial Intelligence: A Promising Tool in Exploring the Phytomicrobiome in Managing Disease and Promoting Plant Health.人工智能:探索植物微生物组以管理疾病和促进植物健康的一种有前景的工具。
Plants (Basel). 2023 Apr 30;12(9):1852. doi: 10.3390/plants12091852.
10
Development and performance evaluation of an artificial intelligence algorithm using cell-free DNA fragment distance for non-invasive prenatal testing (aiD-NIPT).一种使用游离DNA片段距离进行无创产前检测的人工智能算法(aiD-NIPT)的开发与性能评估
Front Genet. 2022 Nov 29;13:999587. doi: 10.3389/fgene.2022.999587. eCollection 2022.
Nucleic Acids Res. 2017 Jan 4;45(D1):D37-D42. doi: 10.1093/nar/gkw1070. Epub 2016 Nov 28.
4
Deep learning for computational biology.用于计算生物学的深度学习。
Mol Syst Biol. 2016 Jul 29;12(7):878. doi: 10.15252/msb.20156651.
5
Deep learning in bioinformatics.生物信息学中的深度学习。
Brief Bioinform. 2017 Sep 1;18(5):851-869. doi: 10.1093/bib/bbw068.
6
Convolutional neural network architectures for predicting DNA-protein binding.用于预测DNA-蛋白质结合的卷积神经网络架构。
Bioinformatics. 2016 Jun 15;32(12):i121-i127. doi: 10.1093/bioinformatics/btw255.
7
Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.巴塞特:利用深度卷积神经网络学习可及基因组的调控密码。
Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3.
8
DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.DanQ:一种用于量化DNA序列功能的卷积与循环相结合的深度神经网络。
Nucleic Acids Res. 2016 Jun 20;44(11):e107. doi: 10.1093/nar/gkw226. Epub 2016 Apr 15.
9
Predicting effects of noncoding variants with deep learning-based sequence model.使用基于深度学习的序列模型预测非编码变异的影响。
Nat Methods. 2015 Oct;12(10):931-4. doi: 10.1038/nmeth.3547. Epub 2015 Aug 24.
10
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.通过深度学习预测 DNA 和 RNA 结合蛋白的序列特异性。
Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.