Suppr超能文献

PhyloAln:一个方便的基于参考的工具,用于在组学时代进行系统发育和进化的序列和高通量读取对齐。

PhyloAln: A Convenient Reference-Based Tool to Align Sequences and High-Throughput Reads for Phylogeny and Evolution in the Omic Era.

机构信息

State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Shenzhen 518107, China.

出版信息

Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae150.

Abstract

The current trend in phylogenetic and evolutionary analyses predominantly relies on omic data. However, prior to core analyses, traditional methods typically involve intricate and time-consuming procedures, including assembly from high-throughput reads, decontamination, gene prediction, homology search, orthology assignment, multiple sequence alignment, and matrix trimming. Such processes significantly impede the efficiency of research when dealing with extensive data sets. In this study, we develop PhyloAln, a convenient reference-based tool capable of directly aligning high-throughput reads or complete sequences with existing alignments as a reference for phylogenetic and evolutionary analyses. Through testing with simulated data sets of species spanning the tree of life, PhyloAln demonstrates consistently robust performance compared with other reference-based tools across different data types, sequencing technologies, coverages, and species, with percent completeness and identity at least 50 percentage points higher in the alignments. Additionally, we validate the efficacy of PhyloAln in removing a minimum of 90% foreign and 70% cross-contamination issues, which are prevalent in sequencing data but often overlooked by other tools. Moreover, we showcase the broad applicability of PhyloAln by generating alignments (completeness mostly larger than 80%, identity larger than 90%) and reconstructing robust phylogenies using real data sets of transcriptomes of ladybird beetles, plastid genes of peppers, or ultraconserved elements of turtles. With these advantages, PhyloAln is expected to facilitate phylogenetic and evolutionary analyses in the omic era. The tool is accessible at https://github.com/huangyh45/PhyloAln.

摘要

目前,系统发育和进化分析主要依赖于组学数据。然而,在核心分析之前,传统方法通常涉及复杂且耗时的步骤,包括从高通量读取中组装、去污染、基因预测、同源搜索、同源物分配、多重序列比对和矩阵修剪。当处理大量数据集时,这些过程会极大地阻碍研究效率。在这项研究中,我们开发了 PhyloAln,这是一种方便的基于参考的工具,能够直接将高通量读取或完整序列与现有对齐作为参考进行系统发育和进化分析。通过对跨越生命之树的物种的模拟数据集进行测试,与其他基于参考的工具相比,PhyloAln 在不同的数据类型、测序技术、覆盖率和物种中表现出一致的稳健性能,在对齐中完整性和同一性至少高出 50 个百分点。此外,我们验证了 PhyloAln 去除至少 90%的外来和 70%的交叉污染问题的功效,这些问题在测序数据中很常见,但其他工具往往忽略了这些问题。此外,我们通过生成对齐(完整性大多大于 80%,同一性大于 90%)并使用瓢虫转录组、辣椒质体基因或海龟超保守元件的真实数据集重建稳健的系统发育来展示 PhyloAln 的广泛适用性。有了这些优势,PhyloAln 有望促进组学时代的系统发育和进化分析。该工具可在 https://github.com/huangyh45/PhyloAln 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8f4/11287380/150b2283cb66/msae150f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验