• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SCaFoS:一种用于系统发育基因组学序列选择、拼接和融合的工具。

SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics.

作者信息

Roure Béatrice, Rodriguez-Ezpeleta Naiara, Philippe Hervé

机构信息

Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de biochimie, Université de Montréal, Montréal, Québec H3C3J7, Canada.

出版信息

BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2148-7-S1-S2.

DOI:10.1186/1471-2148-7-S1-S2
PMID:17288575
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1796611/
Abstract

BACKGROUND

Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context.

RESULTS

Here, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise.

CONCLUSION

SCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species.

摘要

背景

基于富含基因和物种的数据集进行系统发育分析(系统发育基因组学)正成为解决进化问题的标准方法。然而,大型数据集的组装存在若干困难,例如每个物种的基因有多个拷贝(旁系同源或异源基因)、给定物种缺少某些基因或序列不完整。在系统发育推断中使用未检测到的旁系同源或异源基因可能导致结果不准确,而使用不完整序列则会缺乏分辨率。在系统发育基因组学背景下,需要一种能够处理这些问题的同时选择序列、物种和基因的工具。

结果

在此,我们展示了SCaFoS,这是一种工具,它能快速组装包含最大系统发育信息的系统发育基因组数据集,同时在物种、序列和基因的选择中调整缺失数据的量。从单个序列比对开始,并使用用户定义的单系类群,SCaFoS创建具有不完整序列的嵌合体,或在多个序列中选择直系同源和/或进化最慢的序列。一旦选择了代表每个预定义单系类群的序列,SCaFos会根据用户允许的缺失数据水平保留基因,并生成与标准系统发育推断软件兼容的多种格式的超级矩阵和超级树分析文件。由于序列选择没有明确的标准,因此提供了半自动模式以适应用户的专业知识。

结论

SCaFos能够处理数百个物种和基因的数据集,无论是氨基酸水平还是核苷酸水平。它具有图形界面,并且可以集成到自动工作流程中。此外,SCaFoS是第一个整合用户知识以选择直系同源序列、创建嵌合序列以减少缺失数据并根据其缺失数据水平选择基因的工具。最后,将SCaFoS应用于不同的数据集,我们表明明智地选择基因、物种和序列可以减少树重建假象,特别是如果数据集中包括快速进化的物种。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/366e549dc437/1471-2148-7-S1-S2-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/8fadc86a35f5/1471-2148-7-S1-S2-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/c992d63a7886/1471-2148-7-S1-S2-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/6b7037cccaa5/1471-2148-7-S1-S2-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/64c4a7206b4f/1471-2148-7-S1-S2-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/7bc8ac934a98/1471-2148-7-S1-S2-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/52a63483635a/1471-2148-7-S1-S2-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/366e549dc437/1471-2148-7-S1-S2-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/8fadc86a35f5/1471-2148-7-S1-S2-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/c992d63a7886/1471-2148-7-S1-S2-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/6b7037cccaa5/1471-2148-7-S1-S2-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/64c4a7206b4f/1471-2148-7-S1-S2-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/7bc8ac934a98/1471-2148-7-S1-S2-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/52a63483635a/1471-2148-7-S1-S2-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1697/1796611/366e549dc437/1471-2148-7-S1-S2-7.jpg

相似文献

1
SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics.SCaFoS:一种用于系统发育基因组学序列选择、拼接和融合的工具。
BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2148-7-S1-S2.
2
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.PhyloGena——一个用于对未知序列进行自动系统发育注释的用户友好型系统。
Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1.
3
OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.OrthoSelect:一种在系统发育基因组学中选择直系同源组的方案。
BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219.
4
Assessment of phylogenomic and orthology approaches for phylogenetic inference.用于系统发育推断的系统发育基因组学和直系同源方法评估。
Bioinformatics. 2007 Apr 1;23(7):815-24. doi: 10.1093/bioinformatics/btm015. Epub 2007 Jan 19.
5
From phylogenetics to phylogenomics: the evolutionary relationships of insect endosymbiotic gamma-Proteobacteria as a test case.从系统发育学到系统基因组学:以昆虫内共生γ-变形菌的进化关系为例
Syst Biol. 2007 Feb;56(1):1-16. doi: 10.1080/10635150601109759.
6
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
7
Tracing evolutionary pressure.追溯进化压力
Bioinformatics. 2008 Apr 1;24(7):908-15. doi: 10.1093/bioinformatics/btn057. Epub 2008 Feb 26.
8
RibAlign: a software tool and database for eubacterial phylogeny based on concatenated ribosomal protein subunits.RibAlign:一种基于串联核糖体蛋白亚基的真细菌系统发育分析的软件工具和数据库。
BMC Bioinformatics. 2006 Feb 13;7:66. doi: 10.1186/1471-2105-7-66.
9
Phylogenomics Using Transcriptome Data.利用转录组数据的系统发育基因组学
Methods Mol Biol. 2016;1452:65-80. doi: 10.1007/978-1-4939-3774-5_4.
10
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

引用本文的文献

1
BAD2matrix: Phylogenomic matrix concatenation, indel coding, and more.BAD2矩阵:系统发育基因组矩阵拼接、插入缺失编码及更多内容。
Appl Plant Sci. 2024 Sep 24;12(6):e11604. doi: 10.1002/aps3.11604. eCollection 2024 Nov-Dec.
2
Phylogenomic diversity of archigregarine apicomplexans.顶复门原生动物的系统发生基因组多样性。
Open Biol. 2024 Sep;14(9):240141. doi: 10.1098/rsob.240141. Epub 2024 Sep 25.
3
A Guide to Phylogenomic Inference.系统发育基因组推断指南。

本文引用的文献

1
Phylogenetic supertrees: Assembling the trees of life.系统发生超级树:组装生命之树。
Trends Ecol Evol. 1998 Mar;13(3):105-9. doi: 10.1016/S0169-5347(97)01242-1.
2
Tunicates and not cephalochordates are the closest living relatives of vertebrates.被囊动物而非头索动物是现存与脊椎动物亲缘关系最近的生物。
Nature. 2006 Feb 23;439(7079):965-8. doi: 10.1038/nature04336.
3
Orthologs, paralogs, and evolutionary genomics.直系同源基因、旁系同源基因与进化基因组学。
Methods Mol Biol. 2024;2802:267-345. doi: 10.1007/978-1-0716-3838-5_11.
4
Single-cell genomics reveals new rozellid lineages and supports their sister relationship to Microsporidia.单细胞基因组学揭示了新的 Rozellidae 谱系,并支持它们与微孢子虫的姐妹关系。
Biol Lett. 2023 Dec;19(12):20230398. doi: 10.1098/rsbl.2023.0398. Epub 2023 Dec 13.
5
Multiple parallel origins of parasitic Marine Alveolates.寄生海洋微体藻类的多个平行起源。
Nat Commun. 2023 Nov 3;14(1):7049. doi: 10.1038/s41467-023-42807-0.
6
The GEN-ERA toolbox: unified and reproducible workflows for research in microbial genomics.GEN-ERA 工具包:用于微生物基因组学研究的统一且可重复的工作流程。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad022. Epub 2023 Apr 10.
7
Phylogenomics shows unique traits in Noctilucales are derived rather than ancestral.系统发育基因组学表明,夜光藻目独特的特征是衍生而来的,而非祖传的。
PNAS Nexus. 2022 Sep 22;1(4):pgac202. doi: 10.1093/pnasnexus/pgac202. eCollection 2022 Sep.
8
Improving Orthologous Signal and Model Fit in Datasets Addressing the Root of the Animal Phylogeny.提高解决动物系统发育根源问题的数据集的直系同源信号和模型拟合度。
Mol Biol Evol. 2023 Jan 4;40(1). doi: 10.1093/molbev/msac276.
9
SPLACE: A tool to automatically SPLit, Align, and ConcatenatE genes for phylogenomic inference of several organisms.SPLACE:一种用于对多种生物进行系统发育推断时自动拆分、比对和拼接基因的工具。
Front Bioinform. 2022 Dec 8;2:1074802. doi: 10.3389/fbinf.2022.1074802. eCollection 2022.
10
Phylotranscriptomic analyses reveal multiple whole-genome duplication events, the history of diversification and adaptations in the Araceae.系统发生转录组学分析揭示了天南星科的多个全基因组复制事件、多样化历史和适应性。
Ann Bot. 2023 Feb 7;131(1):199-214. doi: 10.1093/aob/mcac062.
Annu Rev Genet. 2005;39:309-38. doi: 10.1146/annurev.genet.39.073003.114725.
4
An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.对深层真核生物系统发育基因组学中长枝吸引假象的实证评估。
Syst Biol. 2005 Oct;54(5):743-57. doi: 10.1080/10635150500234609.
5
Can incomplete taxa rescue phylogenetic analyses from long-branch attraction?不完整的分类单元能否挽救系统发育分析于长枝吸引问题?
Syst Biol. 2005 Oct;54(5):731-42. doi: 10.1080/10635150500234583.
6
Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes.主要光合真核生物的单系性:绿色植物、红藻和灰胞藻。
Curr Biol. 2005 Jul 26;15(14):1325-30. doi: 10.1016/j.cub.2005.06.040.
7
The limits of protein sequence comparison?蛋白质序列比较的局限性?
Curr Opin Struct Biol. 2005 Jun;15(3):254-60. doi: 10.1016/j.sbi.2005.05.005.
8
Genome-scale evidence of the nematode-arthropod clade.线虫-节肢动物进化枝的基因组规模证据。
Genome Biol. 2005;6(5):R41. doi: 10.1186/gb-2005-6-5-r41. Epub 2005 Apr 28.
9
Identifying optimal incomplete phylogenetic data sets from sequence databases.从序列数据库中识别最优的不完全系统发育数据集。
Mol Phylogenet Evol. 2005 Jun;35(3):528-35. doi: 10.1016/j.ympev.2005.02.008. Epub 2005 Mar 21.
10
Phylogenomics and the reconstruction of the tree of life.系统发育基因组学与生命之树的重建
Nat Rev Genet. 2005 May;6(5):361-75. doi: 10.1038/nrg1603.