• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

结合基因预测方法以提高宏基因组基因注释。

Combining gene prediction methods to improve metagenomic gene annotation.

机构信息

Genomic Signal Processing Laboratory, Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA.

出版信息

BMC Bioinformatics. 2011 Jan 13;12:20. doi: 10.1186/1471-2105-12-20.

DOI:10.1186/1471-2105-12-20
PMID:21232129
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3042383/
Abstract

BACKGROUND

Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental) samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation.

RESULTS

We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset.

CONCLUSIONS

To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote) is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding) reads on a real human gut sample sequenced by Illumina technology.

摘要

背景

传统的基因注释方法依赖于可能无法在下一代技术生成的短读段中获得的特征,因此对宏基因组(环境)样本的性能并不理想。因此,近年来,开发了新的程序,这些程序针对短读段进行了性能优化。在这项工作中,我们对三种宏基因组基因预测程序进行了基准测试,并结合它们的预测来改进宏基因组读段的基因注释。

结果

我们不仅像类似的研究一样分析了不同读段长度下程序的性能,还分别分析了不同类型的读段,包括基因内和基因间区域。主要的缺陷在于算法预测非编码区域和基因边缘的能力,导致假阳性和假阴性比预期的多。事实上,算法的特异性明显比敏感性差。通过结合程序的预测,我们在最小化敏感性成本的情况下显著提高了特异性,从而使 100bp 读段的准确性提高了 4%,200bp 读段及以上的准确性提高了约 1%。为了正确注释基因的起始和结束,我们发现对于较短的读段,所有预测器的共识表现最佳,而对于较长的读段,一致的意见更好,将注释准确性提高了 1-8%。我们还在真实数据集上演示了分类器组合的使用。

结论

为了优化预测和注释准确性的性能,我们得出结论,对于 400bp 及更短的读段,所有方法的共识(或多数票)是最佳的,而对于 500bp 及更长的读段,使用 GeneMark 和 Orphelia 预测的交集是最佳的。我们证明,大多数方法在使用 Illumina 技术测序的真实人类肠道样本中预测了超过 80%的编码(包括部分编码)读段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/ff762b1d3857/1471-2105-12-20-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/83683811828a/1471-2105-12-20-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/358da5a39feb/1471-2105-12-20-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/9f6437cfae70/1471-2105-12-20-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/19bd3d4399ce/1471-2105-12-20-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/4baa54560ee7/1471-2105-12-20-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/213c6840dccf/1471-2105-12-20-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/9c25b0845d06/1471-2105-12-20-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/ca11a9beeeb2/1471-2105-12-20-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/dafb7f46ca6e/1471-2105-12-20-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/6295657a2130/1471-2105-12-20-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/610aa206f4ea/1471-2105-12-20-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/4b1868054eda/1471-2105-12-20-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/ff762b1d3857/1471-2105-12-20-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/83683811828a/1471-2105-12-20-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/358da5a39feb/1471-2105-12-20-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/9f6437cfae70/1471-2105-12-20-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/19bd3d4399ce/1471-2105-12-20-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/4baa54560ee7/1471-2105-12-20-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/213c6840dccf/1471-2105-12-20-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/9c25b0845d06/1471-2105-12-20-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/ca11a9beeeb2/1471-2105-12-20-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/dafb7f46ca6e/1471-2105-12-20-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/6295657a2130/1471-2105-12-20-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/610aa206f4ea/1471-2105-12-20-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/4b1868054eda/1471-2105-12-20-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/459c/3042383/ff762b1d3857/1471-2105-12-20-13.jpg

相似文献

1
Combining gene prediction methods to improve metagenomic gene annotation.结合基因预测方法以提高宏基因组基因注释。
BMC Bioinformatics. 2011 Jan 13;12:20. doi: 10.1186/1471-2105-12-20.
2
Benchmarking of gene prediction programs for metagenomic data.宏基因组数据基因预测程序的基准测试。
Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:6190-3. doi: 10.1109/IEMBS.2010.5627744.
3
Short-read reading-frame predictors are not created equal: sequence error causes loss of signal.短读阅读框预测器并不相同:序列错误导致信号丢失。
BMC Bioinformatics. 2012 Jul 28;13:183. doi: 10.1186/1471-2105-13-183.
4
Comparative analysis of functional metagenomic annotation and the mappability of short reads.功能宏基因组注释与短读长可映射性的比较分析。
PLoS One. 2014 Aug 22;9(8):e105776. doi: 10.1371/journal.pone.0105776. eCollection 2014.
5
MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning.MetaCluster-TA:基于组装辅助分箱的宏基因组数据分类注释。
BMC Genomics. 2014;15 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2164-15-S1-S12. Epub 2014 Jan 24.
6
The effect of sequencing errors on metagenomic gene prediction.测序错误对宏基因组基因预测的影响。
BMC Genomics. 2009 Nov 12;10:520. doi: 10.1186/1471-2164-10-520.
7
Exploiting topic modeling to boost metagenomic reads binning.利用主题建模来促进宏基因组读数分箱。
BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-16-S5-S2. Epub 2015 Mar 18.
8
Joining Illumina paired-end reads for classifying phylogenetic marker sequences.将 Illumina 配对末端读取用于分类系统发育标记序列。
BMC Bioinformatics. 2020 Mar 14;21(1):105. doi: 10.1186/s12859-020-3445-6.
9
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
10
From Gene Annotation to Function Prediction for Metagenomics.从宏基因组学的基因注释到功能预测
Methods Mol Biol. 2017;1611:27-34. doi: 10.1007/978-1-4939-7015-5_3.

引用本文的文献

1
Current Trends and Challenges of Microbiome Research in Prostate Cancer.当前前列腺癌微生物组研究的趋势和挑战。
Curr Oncol Rep. 2024 May;26(5):477-487. doi: 10.1007/s11912-024-01520-x. Epub 2024 Apr 4.
2
Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses.微生物生态学中的宏基因组学方法:全基因组和标记基因测序分析的最新进展。
Microb Genom. 2020 Aug;6(8). doi: 10.1099/mgen.0.000409. Epub 2020 Jul 24.
3
Opportunities and obstacles for deep learning in biology and medicine.深度学习在生物学和医学中的机遇与挑战。

本文引用的文献

1
Benchmarking of gene prediction programs for metagenomic data.宏基因组数据基因预测程序的基准测试。
Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:6190-3. doi: 10.1109/IEMBS.2010.5627744.
2
Ab initio gene identification in metagenomic sequences.从头鉴定宏基因组序列中的基因。
Nucleic Acids Res. 2010 Jul;38(12):e132. doi: 10.1093/nar/gkq275. Epub 2010 Apr 19.
3
Sequencing technologies - the next generation.测序技术——下一代。
J R Soc Interface. 2018 Apr;15(141). doi: 10.1098/rsif.2017.0387.
4
Probing the diversity of healthy oral microbiome with bioinformatics approaches.利用生物信息学方法探究健康口腔微生物组的多样性。
BMB Rep. 2016 Dec;49(12):662-670. doi: 10.5483/bmbrep.2016.49.12.164.
5
A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.一种用于微生物翻译起始位点注释的新型质量评估与校正程序。
PLoS One. 2015 Jul 23;10(7):e0133691. doi: 10.1371/journal.pone.0133691. eCollection 2015.
6
Microbial metaproteomics for characterizing the range of metabolic functions and activities of human gut microbiota.用于表征人类肠道微生物群代谢功能和活性范围的微生物元蛋白质组学。
Proteomics. 2015 Oct;15(20):3424-38. doi: 10.1002/pmic.201400571. Epub 2015 May 28.
7
IPred - integrating ab initio and evidence based gene predictions to improve prediction accuracy.IPred——整合从头预测和基于证据的基因预测以提高预测准确性。
BMC Genomics. 2015 Feb 26;16(1):134. doi: 10.1186/s12864-015-1315-9.
8
The dynamics of a family's gut microbiota reveal variations on a theme.家庭肠道微生物组的动态变化揭示了主题的变化。
Microbiome. 2014 Jul 21;2:25. doi: 10.1186/2049-2618-2-25. eCollection 2014.
9
An introduction to the analysis of shotgun metagenomic data. shotgun 宏基因组数据分析简介。
Front Plant Sci. 2014 Jun 16;5:209. doi: 10.3389/fpls.2014.00209. eCollection 2014.
10
The use of metagenomic approaches to analyze changes in microbial communities.使用宏基因组学方法分析微生物群落的变化。
Microbiol Insights. 2013 Apr 16;6:37-48. doi: 10.4137/MBI.S10819. eCollection 2013.
Nat Rev Genet. 2010 Jan;11(1):31-46. doi: 10.1038/nrg2626. Epub 2009 Dec 8.
4
Orphelia: predicting genes in metagenomic sequencing reads.奥菲莉亚:宏基因组测序读段中的基因预测
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W101-5. doi: 10.1093/nar/gkp327. Epub 2009 May 8.
5
A core gut microbiome in obese and lean twins.肥胖与消瘦双胞胎的核心肠道微生物群。
Nature. 2009 Jan 22;457(7228):480-4. doi: 10.1038/nature07540. Epub 2008 Nov 30.
6
MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes.MetaGeneAnnotator:检测核糖体结合位点的物种特异性模式,以在未知原核生物和噬菌体基因组中进行精确的基因预测。
DNA Res. 2008 Dec;15(6):387-96. doi: 10.1093/dnares/dsn027. Epub 2008 Oct 21.
7
Gene prediction in metagenomic fragments: a large scale machine learning approach.宏基因组片段中的基因预测:一种大规模机器学习方法。
BMC Bioinformatics. 2008 Apr 28;9:217. doi: 10.1186/1471-2105-9-217.
8
How much non-coding DNA do eukaryotes require?真核生物需要多少非编码DNA?
J Theor Biol. 2008 Jun 21;252(4):587-92. doi: 10.1016/j.jtbi.2008.02.005. Epub 2008 Feb 14.
9
The impact of next-generation sequencing technology on genetics.下一代测序技术对遗传学的影响。
Trends Genet. 2008 Mar;24(3):133-41. doi: 10.1016/j.tig.2007.12.007. Epub 2008 Feb 11.
10
The relationship between non-protein-coding DNA and eukaryotic complexity.非蛋白质编码DNA与真核生物复杂性之间的关系。
Bioessays. 2007 Mar;29(3):288-99. doi: 10.1002/bies.20544.