• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Tiara:基于深度学习的真核序列分类系统。

Tiara: deep learning-based classification system for eukaryotic sequences.

机构信息

Institute of Evolutionary Biology, Faculty of Biology & Biological and Chemical Research Centre, University of Warsaw, Warszawa 02-089, Poland.

出版信息

Bioinformatics. 2022 Jan 3;38(2):344-350. doi: 10.1093/bioinformatics/btab672.

DOI:10.1093/bioinformatics/btab672
PMID:34570171
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8722755/
Abstract

MOTIVATION

With a large number of metagenomic datasets becoming available, eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and organellar genomes is an essential step toward a better understanding of eukaryotic diversity.

RESULTS

We developed Tiara, a deep-learning-based approach for the identification of eukaryotic sequences in the metagenomic datasets. Its two-step classification process enables the classification of nuclear and organellar eukaryotic fractions and subsequently divides organellar sequences into plastidial and mitochondrial. Using the test dataset, we have shown that Tiara performed similarly to EukRep for prokaryotes classification and outperformed it for eukaryotes classification with lower calculation time. In the tests on the real data, Tiara performed better than EukRep in analyzing the small dataset representing eukaryotic cell microbiome and large dataset from the pelagic zone of oceans. Tiara is also the only available tool correctly classifying organellar sequences, which was confirmed by the recovery of nearly complete plastid and mitochondrial genomes from the test data and real metagenomic data.

AVAILABILITY AND IMPLEMENTATION

Tiara is implemented in python 3.8, available at https://github.com/ibe-uw/tiara and tested on Unix-based systems. It is released under an open-source MIT license and documentation is available at https://ibe-uw.github.io/tiara. Version 1.0.1 of Tiara has been used for all benchmarks.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

随着大量宏基因组数据集的出现,真核生物宏基因组学成为一个新的挑战。正确分类真核生物核和细胞器基因组是更好地理解真核生物多样性的关键步骤。

结果

我们开发了 Tiara,这是一种基于深度学习的方法,用于鉴定宏基因组数据集中的真核序列。它的两步分类过程能够对核和细胞器真核部分进行分类,并随后将细胞器序列分为质体和线粒体。使用测试数据集,我们表明 Tiara 在原核生物分类方面的表现与 EukRep 相似,而在真核生物分类方面的表现优于 EukRep,且计算时间更短。在对真实数据的测试中,Tiara 在分析代表真核细胞微生物组的小数据集和来自海洋远洋区的大数据集方面的表现优于 EukRep。Tiara 也是唯一能够正确分类细胞器序列的可用工具,这一点通过从测试数据和真实宏基因组数据中恢复几乎完整的质体和线粒体基因组得到了证实。

可用性和实现

Tiara 是用 python 3.8 编写的,可在 https://github.com/ibe-uw/tiara 上获得,并在基于 Unix 的系统上进行了测试。它是在开源 MIT 许可证下发布的,文档可在 https://ibe-uw.github.io/tiara 上获得。Tiara 的 1.0.1 版本已用于所有基准测试。

补充信息

补充数据可在《生物信息学》在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8ee/8722755/e5ec8c2d5e7b/btab672f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8ee/8722755/0afb7b9e77b6/btab672f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8ee/8722755/e5ec8c2d5e7b/btab672f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8ee/8722755/0afb7b9e77b6/btab672f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8ee/8722755/e5ec8c2d5e7b/btab672f2.jpg

相似文献

1
Tiara: deep learning-based classification system for eukaryotic sequences.Tiara:基于深度学习的真核序列分类系统。
Bioinformatics. 2022 Jan 3;38(2):344-350. doi: 10.1093/bioinformatics/btab672.
2
Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure.WhoKaryote:基于基因结构区分宏基因组中的真核生物和原核生物序列。
Microb Genom. 2022 May;8(5). doi: 10.1099/mgen.0.000823.
3
CoCoNet: an efficient deep learning tool for viral metagenome binning.CoCoNet:一种用于病毒宏基因组分箱的高效深度学习工具。
Bioinformatics. 2021 Sep 29;37(18):2803-2810. doi: 10.1093/bioinformatics/btab213.
4
SNIKT: sequence-independent adapter identification and removal in long-read shotgun sequencing data.SNIKT:长读测序数据中序列无关接头的识别与去除。
Bioinformatics. 2022 Aug 2;38(15):3830-3832. doi: 10.1093/bioinformatics/btac389.
5
Simulating Illumina metagenomic data with InSilicoSeq.用 InSilicoSeq 模拟 Illumina 宏基因组数据。
Bioinformatics. 2019 Feb 1;35(3):521-522. doi: 10.1093/bioinformatics/bty630.
6
Higher-order Markov models for metagenomic sequence classification.用于宏基因组序列分类的高阶马尔可夫模型。
Bioinformatics. 2020 Aug 15;36(14):4130-4136. doi: 10.1093/bioinformatics/btaa562.
7
Virtifier: a deep learning-based identifier for viral sequences from metagenomes.Virtifier:一种基于深度学习的宏基因组病毒序列标识符。
Bioinformatics. 2022 Feb 7;38(5):1216-1222. doi: 10.1093/bioinformatics/btab845.
8
CONSTAX2: improved taxonomic classification of environmental DNA markers.CONSTAX2:改进环境 DNA 标记物的分类学分类。
Bioinformatics. 2021 Nov 5;37(21):3941-3943. doi: 10.1093/bioinformatics/btab347.
9
MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics.元真核生物敏感、高通量的基因发现和注释,用于大规模真核生物宏基因组学。
Microbiome. 2020 Apr 3;8(1):48. doi: 10.1186/s40168-020-00808-x.
10
Metaviral SPAdes: assembly of viruses from metagenomic data.Metaviral SPAdes:从宏基因组数据中组装病毒。
Bioinformatics. 2020 Aug 15;36(14):4126-4129. doi: 10.1093/bioinformatics/btaa490.

引用本文的文献

1
A genomic view of Earth's biomes.地球生物群落的基因组视角。
Nat Rev Genet. 2025 Sep 15. doi: 10.1038/s41576-025-00888-1.
2
A new species of (Diplopoda, Polyxenida, Polyxenidae) from East China, with embryonic and post-embryonic development observations, and mitogenomic and genetic divergence analyses.来自中国东部的一种新的(倍足纲,多栉虫目,多栉虫科),并伴有胚胎和胚后发育观察以及线粒体基因组和遗传分化分析 。
Zookeys. 2025 Jul 22;1247:63-88. doi: 10.3897/zookeys.1247.155348. eCollection 2025.
3
Genome-resolved long-read sequencing expands known microbial diversity across terrestrial habitats.

本文引用的文献

1
Genomic evidence for global ocean plankton biogeography shaped by large-scale current systems.基因组证据表明,大规模洋流系统塑造了海洋浮游生物的全球生物地理学分布格局。
Elife. 2022 Aug 3;11:e78129. doi: 10.7554/eLife.78129.
2
A microbial eukaryote with a unique combination of purple bacteria and green algae as endosymbionts.一种微生物真核生物,其独特地结合了紫色细菌和绿藻作为内共生体。
Sci Adv. 2021 Jun 11;7(24). doi: 10.1126/sciadv.abg4102. Print 2021 Jun.
3
DeepMicrobes: taxonomic classification for metagenomics with deep learning.深度微生物:用于宏基因组学的深度学习分类法
基因组解析长读长测序扩展了陆地生境中已知的微生物多样性。
Nat Microbiol. 2025 Jul 24. doi: 10.1038/s41564-025-02062-z.
4
Arctic Ocean virus communities and their seasonality, bipolarity, and prokaryotic associations.北冰洋病毒群落及其季节性、两极分布和原核生物关联。
Nat Commun. 2025 Jul 11;16(1):6427. doi: 10.1038/s41467-025-61568-6.
5
Tracing non-fungal eukaryotic diversity via shotgun metagenomes in the complex mudflat intertidal zones.通过鸟枪法宏基因组学追踪复杂泥滩潮间带中的非真菌真核生物多样性。
mSystems. 2025 Jul 22;10(7):e0041325. doi: 10.1128/msystems.00413-25. Epub 2025 Jun 12.
6
Complete mitochondrial genome of the fungal pathogen f. sp. responsible for fusarium wilt of palms.导致棕榈枯萎病的真菌病原体f. sp. 的完整线粒体基因组。
Microbiol Resour Announc. 2025 Jul 10;14(7):e0007025. doi: 10.1128/mra.00070-25. Epub 2025 Jun 10.
7
Chromosome-Level Assemblies of Three Candidatus Liberibacter solanacearum Vectors: Dyspersa apicalis (Förster, 1848), Dyspersa pallida (Burckhardt, 1986), and Trioza urticae (Linnaeus, 1758) (Hemiptera: Psylloidea).三种疑似茄科韧皮杆菌载体的染色体水平组装:顶斑潜蝇(Förster,1848年)、苍白斑潜蝇(Burckhardt,1986年)和荨麻三节叶蝉(Linnaeus,1758年)(半翅目:木虱科)
Genome Biol Evol. 2025 May 30;17(6). doi: 10.1093/gbe/evaf116.
8
Chromosome-length genome assembly of the critically endangered Mountain bongo (Tragelaphus eurycerus isaaci): a resource for conservation and comparative genomics.极度濒危的山地邦戈羚(Tragelaphus eurycerus isaaci)的染色体水平基因组组装:保护和比较基因组学的资源
G3 (Bethesda). 2025 Jul 9;15(7). doi: 10.1093/g3journal/jkaf109.
9
Faecal metagenomes of great tits and blue tits provide insights into host, diet, pathogens and microbial biodiversity.大山雀和蓝山雀的粪便宏基因组为宿主、饮食、病原体和微生物多样性提供了见解。
Access Microbiol. 2025 Apr 28;7(4). doi: 10.1099/acmi.0.000910.v3. eCollection 2025.
10
Eukfinder: a pipeline to retrieve microbial eukaryote genome sequences from metagenomic data.Eukfinder:一种从宏基因组数据中检索微生物真核生物基因组序列的流程。
mBio. 2025 May 14;16(5):e0069925. doi: 10.1128/mbio.00699-25. Epub 2025 Apr 10.
NAR Genom Bioinform. 2020 Feb 19;2(1):lqaa009. doi: 10.1093/nargab/lqaa009. eCollection 2020 Mar.
4
phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes.phyloFlash:来自宏基因组的快速小亚基核糖体RNA分析和靶向组装
mSystems. 2020 Oct 27;5(5):e00920-20. doi: 10.1128/mSystems.00920-20.
5
Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA.机器学习算法在DNA序列数据挖掘中的应用综述
Front Bioeng Biotechnol. 2020 Sep 4;8:1032. doi: 10.3389/fbioe.2020.01032. eCollection 2020.
6
Genomic Insights into Plastid Evolution.基因组视角下的质体进化研究
Genome Biol Evol. 2020 Jul 1;12(7):978-990. doi: 10.1093/gbe/evaa096.
7
A metagenomic assessment of microbial eukaryotic diversity in the global ocean.对全球海洋微生物真核生物多样性的宏基因组评估。
Mol Ecol Resour. 2020 May;20(3). doi: 10.1111/1755-0998.13147. Epub 2020 Mar 11.
8
Genome Resolved Biogeography of Mamiellales.Mamiellales 的基因组解析生物地理学。
Genes (Basel). 2020 Jan 7;11(1):66. doi: 10.3390/genes11010066.
9
Alignment-Free Sequence Analysis and Applications.无比对序列分析及其应用
Annu Rev Biomed Data Sci. 2018 Jul;1:93-114. doi: 10.1146/annurev-biodatasci-080917-013431. Epub 2018 Apr 25.
10
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.