• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

WhoKaryote:基于基因结构区分宏基因组中的真核生物和原核生物序列。

Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure.

机构信息

Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.

出版信息

Microb Genom. 2022 May;8(5). doi: 10.1099/mgen.0.000823.

DOI:10.1099/mgen.0.000823
PMID:35503723
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9465069/
Abstract

Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic, likely resulting in less accurate annotation of eukaryotes in metagenomes. Early detection of eukaryotic contigs allows for eukaryote-specific gene prediction and functional annotation. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in terms of gene structure. We first developed Whokaryote, a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated recall, precision and accuracy of 94, 96 and 95 %, respectively, this classifier with features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By retraining our classifier with Tiara predictions as an additional feature, the weaknesses of both types of classifiers are compensated; the result is Whokaryote+Tiara, an enhanced classifier that outperforms all individual classifiers, with an F1 score of 0.99 for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endospheric microbial community, we show how using Whokaryote+Tiara to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Whokaryote (+Tiara) is wrapped in an easily installable package and is freely available from https://github.com/LottePronk/whokaryote.

摘要

宏基因组学已成为研究微生物群落中所有生物功能潜力的重要技术。大多数研究都集中在这些群落的细菌含量上,而忽略了真核微生物。事实上,许多宏基因组学分析管道默认为宏基因组中的所有基因序列都是原核生物的,这可能导致宏基因组中真核生物的注释不够准确。早期检测真核生物基因序列有助于进行真核生物的基因预测和功能注释。在这里,我们开发了一种分类器,该分类器基于基因结构方面的差异,可区分真核生物和原核生物基因序列。我们首先开发了 Whokaryote,这是一种随机森林分类器,使用基因间距离、基因密度和基因长度作为最重要的特征。我们表明,该分类器的召回率、精度和准确率估计分别为 94%、96%和 95%,该分类器基于生物学特征,可以与使用 K -mer 频率作为特征的分类器 EukRep 和 Tiara 一样出色。通过使用 Tiara 的预测作为附加特征重新训练我们的分类器,两种类型的分类器的弱点都得到了弥补;结果是 Whokaryote+Tiara,这是一个增强的分类器,在真核生物和原核生物的 F1 分数均为 0.99,性能优于所有单个分类器,同时仍然快速。在对一种具有疾病抑制作用的植物内生生境微生物群落的宏基因组数据的重新分析中,我们展示了如何使用 Whokaryote+Tiara 选择真核生物基因预测的基因序列,从而有助于发现原始研究中遗漏的几个生物合成基因簇。Whokaryote(+Tiara)已包装在一个易于安装的软件包中,并可从 https://github.com/LottePronk/whokaryote 免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98fc/9465069/d7caf201a808/mgen-8-823-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98fc/9465069/42c71a03739d/mgen-8-823-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98fc/9465069/13f553a6423a/mgen-8-823-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98fc/9465069/d7caf201a808/mgen-8-823-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98fc/9465069/42c71a03739d/mgen-8-823-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98fc/9465069/13f553a6423a/mgen-8-823-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98fc/9465069/d7caf201a808/mgen-8-823-g003.jpg

相似文献

1
Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure.WhoKaryote:基于基因结构区分宏基因组中的真核生物和原核生物序列。
Microb Genom. 2022 May;8(5). doi: 10.1099/mgen.0.000823.
2
MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics.元真核生物敏感、高通量的基因发现和注释,用于大规模真核生物宏基因组学。
Microbiome. 2020 Apr 3;8(1):48. doi: 10.1186/s40168-020-00808-x.
3
Tiara: deep learning-based classification system for eukaryotic sequences.Tiara:基于深度学习的真核序列分类系统。
Bioinformatics. 2022 Jan 3;38(2):344-350. doi: 10.1093/bioinformatics/btab672.
4
Improvement of eukaryotic protein predictions from soil metagenomes.从土壤宏基因组中提高真核生物蛋白质预测。
Sci Data. 2022 Jun 16;9(1):311. doi: 10.1038/s41597-022-01420-4.
5
Metagenomic discovery of microbial eukaryotes in stool microbiomes.粪便微生物组中微生物真核生物的宏基因组学发现。
mBio. 2024 Oct 16;15(10):e0206324. doi: 10.1128/mbio.02063-24. Epub 2024 Aug 29.
6
VEBA: a modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes.VEBA:一个用于元基因组中细菌、微真核生物和病毒基因组的从头组装、聚类和分析的模块化端到端套件。
BMC Bioinformatics. 2022 Oct 12;23(1):419. doi: 10.1186/s12859-022-04973-8.
7
ACR: metagenome-assembled prokaryotic and eukaryotic genome refinement tool.ACR:宏基因组组装原核生物和真核生物基因组精修工具。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad381.
8
Recovery of 197 eukaryotic bins reveals major challenges for eukaryote genome reconstruction from terrestrial metagenomes.从陆地宏基因组中重建真核生物基因组面临的主要挑战:197 个真核生物类群的恢复。
Mol Ecol Resour. 2023 Jul;23(5):1066-1076. doi: 10.1111/1755-0998.13776. Epub 2023 Mar 20.
9
Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes.提高真核生物检测的兼容性,以实现大规模宏基因组自动化分析。
Microbiome. 2023 Apr 10;11(1):72. doi: 10.1186/s40168-023-01505-1.
10
Contigs directed gene annotation (ConDiGA) for accurate protein sequence database construction in metaproteomics.宏基因组学中用于准确蛋白质序列数据库构建的 Contigs 定向基因注释(ConDiGA)。
Microbiome. 2024 Mar 19;12(1):58. doi: 10.1186/s40168-024-01775-3.

引用本文的文献

1
The Evolutionary History and Modern Diversity of Triterpenoid Cyclases.三萜环化酶的进化史与现代多样性
Mol Biol Evol. 2025 Sep 1;42(9). doi: 10.1093/molbev/msaf203.
2
The encoded and expressed biosynthetic potential of Greenland Ice Sheet microbes.格陵兰冰盖微生物的编码及表达生物合成潜力。
Front Microbiol. 2025 Jul 31;16:1620548. doi: 10.3389/fmicb.2025.1620548. eCollection 2025.
3
The evolutionary history and modern diversity of triterpenoid cyclases.三萜环化酶的进化史与现代多样性

本文引用的文献

1
Tiara: deep learning-based classification system for eukaryotic sequences.Tiara:基于深度学习的真核序列分类系统。
Bioinformatics. 2022 Jan 3;38(2):344-350. doi: 10.1093/bioinformatics/btab672.
2
Characterization of eukaryotic microbiome and associated bacteria communities in a drinking water treatment plant.鉴定饮用水处理厂中的真核微生物组及其相关细菌群落。
Sci Total Environ. 2021 Nov 25;797:149070. doi: 10.1016/j.scitotenv.2021.149070. Epub 2021 Jul 17.
3
Complete Genome Sequence of and Comparative Analysis of Virulence Gene Content With .
bioRxiv. 2025 Aug 2:2024.10.28.620730. doi: 10.1101/2024.10.28.620730.
4
Metagenomes from cyanobacterial harmful algal blooms from lakes in Ohio (USA).来自美国俄亥俄州湖泊中蓝藻有害藻华的宏基因组。
Microbiol Resour Announc. 2025 Aug 14;14(8):e0040025. doi: 10.1128/mra.00400-25. Epub 2025 Jul 7.
5
Tracing non-fungal eukaryotic diversity via shotgun metagenomes in the complex mudflat intertidal zones.通过鸟枪法宏基因组学追踪复杂泥滩潮间带中的非真菌真核生物多样性。
mSystems. 2025 Jul 22;10(7):e0041325. doi: 10.1128/msystems.00413-25. Epub 2025 Jun 12.
6
Draft genome sequence of a marine coccolithophore NIES-4509.海洋颗石藻NIES-4509的基因组序列草图
Microbiol Resour Announc. 2025 Jul 10;14(7):e0135724. doi: 10.1128/mra.01357-24. Epub 2025 Jun 9.
7
Soil microbial responses to multiple global change factors as assessed by metagenomics.通过宏基因组学评估土壤微生物对多种全球变化因素的响应。
Nat Commun. 2025 May 31;16(1):5058. doi: 10.1038/s41467-025-60390-4.
8
Eukfinder: a pipeline to retrieve microbial eukaryote genome sequences from metagenomic data.Eukfinder:一种从宏基因组数据中检索微生物真核生物基因组序列的流程。
mBio. 2025 May 14;16(5):e0069925. doi: 10.1128/mbio.00699-25. Epub 2025 Apr 10.
9
Meta-omics reveals role of photosynthesis in microbially induced carbonate precipitation at a CO-rich geyser.宏组学揭示了光合作用在富含一氧化碳的间歇泉中微生物诱导碳酸盐沉淀过程中的作用。
ISME Commun. 2024 Dec 11;4(1):ycae139. doi: 10.1093/ismeco/ycae139. eCollection 2024 Jan.
10
Chromosome-scale telomere to telomere genome assembly of common crystalwort (Riccia sorocarpa Bisch.).普通晶藓(Riccia sorocarpa Bisch.)的染色体级端粒到端粒基因组组装
Sci Data. 2025 Jan 15;12(1):77. doi: 10.1038/s41597-025-04373-6.
[具体物种名称]的全基因组序列及与[对比物种名称]毒力基因含量的比较分析
Front Microbiol. 2021 May 21;12:684092. doi: 10.3389/fmicb.2021.684092. eCollection 2021.
4
Accurate and sensitive detection of microbial eukaryotes from whole metagenome shotgun sequencing.从宏基因组鸟枪法测序中准确而灵敏地检测微生物真核生物。
Microbiome. 2021 Mar 3;9(1):58. doi: 10.1186/s40168-021-01015-y.
5
DIAMOND+MEGAN: Fast and Easy Taxonomic and Functional Analysis of Short and Long Microbiome Sequences.DIAMOND+MEGAN:快速便捷的短长微生物组序列分类学和功能分析。
Curr Protoc. 2021 Mar;1(3):e59. doi: 10.1002/cpz1.59.
6
Gut microbiota in human metabolic health and disease.人体肠道微生物群与代谢健康和疾病。
Nat Rev Microbiol. 2021 Jan;19(1):55-71. doi: 10.1038/s41579-020-0433-9. Epub 2020 Sep 4.
7
Plant-microbiome interactions: from community assembly to plant health.植物-微生物组相互作用:从群落组装到植物健康。
Nat Rev Microbiol. 2020 Nov;18(11):607-621. doi: 10.1038/s41579-020-0412-1. Epub 2020 Aug 12.
8
MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics.元真核生物敏感、高通量的基因发现和注释,用于大规模真核生物宏基因组学。
Microbiome. 2020 Apr 3;8(1):48. doi: 10.1186/s40168-020-00808-x.
9
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
10
Pathogen-induced activation of disease-suppressive functions in the endophytic root microbiome.内生根微生物组中病原体诱导的疾病抑制功能的激活。
Science. 2019 Nov 1;366(6465):606-612. doi: 10.1126/science.aaw9285.