• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈克:一种用于构建系统发育树的超快、无需组装和无需对齐的方法。

MIKE: an ultrafast, assembly-, and alignment-free approach for phylogenetic tree construction.

机构信息

College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, Shanxi 030024, China.

National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China.

出版信息

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae154.

DOI:10.1093/bioinformatics/btae154
PMID:38547397
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10990684/
Abstract

MOTIVATION

Constructing a phylogenetic tree requires calculating the evolutionary distance between samples or species via large-scale resequencing data, a process that is both time-consuming and computationally demanding. Striking the right balance between accuracy and efficiency is a significant challenge.

RESULTS

To address this, we introduce a new algorithm, MIKE (MinHash-based k-mer algorithm). This algorithm is designed for the swift calculation of the Jaccard coefficient directly from raw sequencing reads and enables the construction of phylogenetic trees based on the resultant Jaccard coefficient. Simulation results highlight the superior speed of MIKE compared to existing state-of-the-art methods. We used MIKE to reconstruct a phylogenetic tree, incorporating 238 yeast, 303 Zea, 141 Ficus, 67 Oryza, and 43 Saccharum spontaneum samples. MIKE demonstrated accurate performance across varying evolutionary scales, reproductive modes, and ploidy levels, proving itself as a powerful tool for phylogenetic tree construction.

AVAILABILITY AND IMPLEMENTATION

MIKE is publicly available on Github at https://github.com/Argonum-Clever2/mike.git.

摘要

动机

构建系统发育树需要通过大规模重测序数据计算样本或物种之间的进化距离,这是一个既耗时又耗费计算资源的过程。在准确性和效率之间取得恰当的平衡是一个重大挑战。

结果

为了解决这个问题,我们引入了一种新的算法,MIKE(基于 MinHash 的 k-mer 算法)。该算法旨在从原始测序reads 中快速计算 Jaccard 系数,并能够基于所得的 Jaccard 系数构建系统发育树。模拟结果突出了 MIKE 相对于现有最先进方法的卓越速度。我们使用 MIKE 重建了一个系统发育树,其中包含 238 个酵母、303 个玉米、141 个榕属、67 个稻属和 43 个野生甘蔗样本。MIKE 在不同的进化尺度、生殖模式和倍性水平上表现出准确的性能,证明了它是构建系统发育树的有力工具。

可用性和实现

MIKE 可在 Github 上公开获取,网址为 https://github.com/Argonum-Clever2/mike.git。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/919d467830c4/btae154f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/fb88714f4b61/btae154f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/3f84927c08e0/btae154f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/72be7e4c3979/btae154f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/3f38b023dd78/btae154f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/919d467830c4/btae154f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/fb88714f4b61/btae154f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/3f84927c08e0/btae154f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/72be7e4c3979/btae154f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/3f38b023dd78/btae154f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ed9/10990684/919d467830c4/btae154f5.jpg

相似文献

1
MIKE: an ultrafast, assembly-, and alignment-free approach for phylogenetic tree construction.迈克:一种用于构建系统发育树的超快、无需组装和无需对齐的方法。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae154.
2
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
3
Invariant transformers of Robinson and Foulds distance matrices for Convolutional Neural Network.不变的 Robinson 和 Foulds 距离矩阵变换用于卷积神经网络。
J Bioinform Comput Biol. 2022 Aug;20(4):2250012. doi: 10.1142/S0219720022500123. Epub 2022 Jul 6.
4
Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction.用于系统发育树重建的统计一致k-mer方法
J Comput Biol. 2017 Feb;24(2):153-171. doi: 10.1089/cmb.2015.0216. Epub 2016 Jul 7.
5
EPIK: precise and scalable evolutionary placement with informative k-mers.EPIK:带信息 k- -mer 的精确可扩展进化放置。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad692.
6
ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees.ASTRAL-Pro 2:从多拷贝基因家族树重建超快种系发生树。
Bioinformatics. 2022 Oct 31;38(21):4949-4950. doi: 10.1093/bioinformatics/btac620.
7
Rapid alignment-free phylogenetic identification of metagenomic sequences.基于快速比对的宏基因组序列系统发育鉴定
Bioinformatics. 2019 Sep 15;35(18):3303-3312. doi: 10.1093/bioinformatics/btz068.
8
Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction.在系统发育树重建中,k元组距离与四种基于模型的距离之间的性能比较。
Nucleic Acids Res. 2008 Mar;36(5):e33. doi: 10.1093/nar/gkn075. Epub 2008 Feb 22.
9
Phylogenetic double placement of mixed samples.混合样本的系统发育双重定位。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i335-i343. doi: 10.1093/bioinformatics/btaa489.
10
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II:一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。
Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

引用本文的文献

1
PhyloTune: An efficient method to accelerate phylogenetic updates using a pretrained DNA language model.PhyloTune:一种使用预训练DNA语言模型加速系统发育更新的有效方法。
Nat Commun. 2025 Jul 26;16(1):6905. doi: 10.1038/s41467-025-61684-3.
2
 (Poaceae), a new species from Xizang, China.禾本科(Poaceae),中国西藏的一个新物种。
PhytoKeys. 2025 Jun 3;257:65-78. doi: 10.3897/phytokeys.257.151771. eCollection 2025.
3
Haplotype-resolved and chromosome-level reference genome assembly of provides insights into the evolution and juvenile growth of persimmon.

本文引用的文献

1
Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree.使用 Read2Tree 从原始测序reads 直接推断系统发育树。
Nat Biotechnol. 2024 Jan;42(1):139-147. doi: 10.1038/s41587-023-01753-4. Epub 2023 Apr 20.
2
Genome sequencing reveals evidence of adaptive variation in the genus Zea.基因组测序揭示了玉米属中适应性变异的证据。
Nat Genet. 2022 Nov;54(11):1736-1745. doi: 10.1038/s41588-022-01184-y. Epub 2022 Oct 20.
3
Genomic insights into the recent chromosome reduction of autopolyploid sugarcane Saccharum spontaneum.
柿树的单倍型解析和染色体水平参考基因组组装为柿树的进化和幼年期生长提供了见解。
Hortic Res. 2025 Jan 8;12(4):uhaf001. doi: 10.1093/hr/uhaf001. eCollection 2025 Apr.
4
Assembly-free reads accurate identification (AFRAID) approach outperforms other methods of DNA barcoding in the walnut family (Juglandaceae).无组装读段精确识别(AFRAID)方法在胡桃科(Juglandaceae)中优于其他DNA条形码方法。
Plant Divers. 2024 Oct 16;47(1):115-126. doi: 10.1016/j.pld.2024.10.002. eCollection 2025 Jan.
5
Chromosome-level subgenome-aware de novo assembly provides insight into genome divergence after hybridization.基于染色体级别的亚基因组感知从头组装揭示了杂交后基因组分化的机制。
Genome Res. 2024 Nov 20;34(11):2133-2146. doi: 10.1101/gr.279364.124.
基因组解析揭示同源多倍体甘蔗(Saccharum spontaneum)近期染色体缩减的奥秘。
Nat Genet. 2022 Jun;54(6):885-896. doi: 10.1038/s41588-022-01084-1. Epub 2022 Jun 2.
4
Arabis alpina: A perennial model plant for ecological genomics and life-history evolution.高山虎耳草:生态基因组学和生活史进化的多年生模式植物。
Mol Ecol Resour. 2022 Feb;22(2):468-486. doi: 10.1111/1755-0998.13490. Epub 2021 Sep 7.
5
Towards population-scale long-read sequencing.迈向大规模长读长测序。
Nat Rev Genet. 2021 Sep;22(9):572-587. doi: 10.1038/s41576-021-00367-3. Epub 2021 May 28.
6
Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation.交互式生命树 (iTOL) v5:一个用于显示和注释系统发育树的在线工具。
Nucleic Acids Res. 2021 Jul 2;49(W1):W293-W296. doi: 10.1093/nar/gkab301.
7
Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis.Kssd:通过 K-mer 子串空间采样进行序列降维,实现实时大规模数据集分析。
Genome Biol. 2021 Mar 16;22(1):84. doi: 10.1186/s13059-021-02303-4.
8
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
9
Brassica oleracea var. capitata f. alba: A Review on its Botany, Traditional uses, Phytochemistry and Pharmacological Activities.白菜花:其植物学、传统用途、植物化学和药理学活性的综述。
Mini Rev Med Chem. 2021;21(16):2399-2417. doi: 10.2174/1389557521666210111150036.
10
LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files.LDBlockShow:一种基于变体调用格式文件快速方便地可视化连锁不平衡和单倍型块的工具。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa227.