• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Agalma:一个自动化的系统发育基因组学工作流程。

Agalma: an automated phylogenomics workflow.

机构信息

Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, USA.

出版信息

BMC Bioinformatics. 2013 Nov 19;14:330. doi: 10.1186/1471-2105-14-330.

DOI:10.1186/1471-2105-14-330
PMID:24252138
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3840672/
Abstract

BACKGROUND

In the past decade, transcriptome data have become an important component of many phylogenetic studies. They are a cost-effective source of protein-coding gene sequences, and have helped projects grow from a few genes to hundreds or thousands of genes. Phylogenetic studies now regularly include genes from newly sequenced transcriptomes, as well as publicly available transcriptomes and genomes. Implementing such a phylogenomic study, however, is computationally intensive, requires the coordinated use of many complex software tools, and includes multiple steps for which no published tools exist. Phylogenomic studies have therefore been manual or semiautomated. In addition to taking considerable user time, this makes phylogenomic analyses difficult to reproduce, compare, and extend. In addition, methodological improvements made in the context of one study often cannot be easily applied and evaluated in the context of other studies.

RESULTS

We present Agalma, an automated tool that constructs matrices for phylogenomic analyses. The user provides raw Illumina transcriptome data, and Agalma produces annotated assemblies, aligned gene sequence matrices, a preliminary phylogeny, and detailed diagnostics that allow the investigator to make extensive assessments of intermediate analysis steps and the final results. Sequences from other sources, such as externally assembled genomes and transcriptomes, can also be incorporated in the analyses. Agalma is built on the BioLite bioinformatics framework, which tracks provenance, profiles processor and memory use, records diagnostics, manages metadata, installs dependencies, logs version numbers and calls to external programs, and enables rich HTML reports for all stages of the analysis. Agalma includes a small test data set and a built-in test analysis of these data. In addition to describing Agalma, we here present a sample analysis of a larger seven-taxon data set. Agalma is available for download at https://bitbucket.org/caseywdunn/agalma.

CONCLUSIONS

Agalma allows complex phylogenomic analyses to be implemented and described unambiguously as a series of high-level commands. This will enable phylogenomic studies to be readily reproduced, modified, and extended. Agalma also facilitates methods development by providing a complete modular workflow, bundled with test data, that will allow further optimization of each step in the context of a full phylogenomic analysis.

摘要

背景

在过去的十年中,转录组数据已成为许多系统发育研究的重要组成部分。它们是一种经济高效的蛋白质编码基因序列来源,并帮助项目从少数基因扩展到数百或数千个基因。现在,系统发育研究经常包括来自新测序转录组的基因,以及公开的转录组和基因组。然而,实施这样的基因组研究在计算上是密集的,需要协调使用许多复杂的软件工具,并包含多个没有发布工具的步骤。因此,基因组研究是手动或半自动的。除了需要大量用户时间外,这使得基因组分析难以重现、比较和扩展。此外,在一项研究中进行的方法改进通常难以在其他研究中轻松应用和评估。

结果

我们提出了 Agalma,这是一种用于构建基因组分析矩阵的自动化工具。用户提供 Illumina 转录组的原始数据,Agalma 会生成带注释的组装、对齐的基因序列矩阵、初步系统发育以及详细的诊断,这些都允许研究人员对中间分析步骤和最终结果进行广泛评估。来自其他来源的序列,例如外部组装的基因组和转录组,也可以包含在分析中。Agalma 构建在 BioLite 生物信息学框架上,该框架跟踪来源、分析处理器和内存使用情况、记录诊断、管理元数据、安装依赖项、记录版本号和对外部程序的调用,并为分析的所有阶段提供丰富的 HTML 报告。Agalma 包括一个小型测试数据集和对这些数据的内置测试分析。除了描述 Agalma 之外,我们还在此展示了对更大的七分类群数据集的示例分析。Agalma 可在 https://bitbucket.org/caseywdunn/agalma 下载。

结论

Agalma 允许明确地将复杂的基因组分析实现和描述为一系列高级命令。这将使基因组研究能够轻松重现、修改和扩展。Agalma 还通过提供带有测试数据的完整模块化工作流程来促进方法开发,这将允许在完整基因组分析的背景下进一步优化每个步骤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa9/3840672/41e26b8ea204/1471-2105-14-330-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa9/3840672/84ee20c3845e/1471-2105-14-330-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa9/3840672/757f52517cf2/1471-2105-14-330-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa9/3840672/0aa037d0cf79/1471-2105-14-330-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa9/3840672/41e26b8ea204/1471-2105-14-330-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa9/3840672/84ee20c3845e/1471-2105-14-330-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa9/3840672/757f52517cf2/1471-2105-14-330-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa9/3840672/0aa037d0cf79/1471-2105-14-330-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa9/3840672/41e26b8ea204/1471-2105-14-330-4.jpg

相似文献

1
Agalma: an automated phylogenomics workflow.Agalma:一个自动化的系统发育基因组学工作流程。
BMC Bioinformatics. 2013 Nov 19;14:330. doi: 10.1186/1471-2105-14-330.
2
Revising transcriptome assemblies with phylogenetic information.利用系统发育信息修正转录组组装。
PLoS One. 2021 Jan 12;16(1):e0244202. doi: 10.1371/journal.pone.0244202. eCollection 2021.
3
OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.OrthoSelect:一种在系统发育基因组学中选择直系同源组的方案。
BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219.
4
Phylogenomics Using Transcriptome Data.利用转录组数据的系统发育基因组学
Methods Mol Biol. 2016;1452:65-80. doi: 10.1007/978-1-4939-3774-5_4.
5
Orthology inference in nonmodel organisms using transcriptomes and low-coverage genomes: improving accuracy and matrix occupancy for phylogenomics.利用转录组和低覆盖度基因组在非模式生物中进行直系同源基因推断:提高系统发育基因组学的准确性和矩阵占有率
Mol Biol Evol. 2014 Nov;31(11):3081-92. doi: 10.1093/molbev/msu245. Epub 2014 Aug 25.
6
phyloSkeleton: taxon selection, data retrieval and marker identification for phylogenomics.系统发育骨架:用于系统发育基因组学的分类群选择、数据检索和标记识别
Bioinformatics. 2017 Apr 15;33(8):1230-1232. doi: 10.1093/bioinformatics/btw824.
7
BaCoCa--a heuristic software tool for the parallel assessment of sequence biases in hundreds of gene and taxon partitions.BaCoCa——一种用于并行评估数百个基因和分类单元分区中序列偏差的启发式软件工具。
Mol Phylogenet Evol. 2014 Jan;70:94-8. doi: 10.1016/j.ympev.2013.09.011. Epub 2013 Sep 25.
8
GToTree: a user-friendly workflow for phylogenomics.GToTree:一个用户友好的系统发育基因组学工作流程。
Bioinformatics. 2019 Oct 15;35(20):4162-4164. doi: 10.1093/bioinformatics/btz188.
9
10
Reptilian Transcriptomes v2.0: An Extensive Resource for Sauropsida Genomics and Transcriptomics.爬行动物转录组v2.0:蜥形纲基因组学和转录组学的丰富资源。
Genome Biol Evol. 2015 Jul 1;7(6):1827-41. doi: 10.1093/gbe/evv106.

引用本文的文献

1
wQFM-DISCO: DISCO-enabled wQFM improves phylogenomic analyses despite the presence of paralogs.wQFM-DISCO:尽管存在旁系同源物,但启用DISCO的wQFM改善了系统发育基因组分析。
Bioinform Adv. 2024 Nov 27;4(1):vbae189. doi: 10.1093/bioadv/vbae189. eCollection 2024.
2
Major Revisions in Pancrustacean Phylogeny and Evidence of Sensitivity to Taxon Sampling.泛甲壳动物系统发育的重大修订和对分类群采样敏感性的证据。
Mol Biol Evol. 2023 Aug 3;40(8). doi: 10.1093/molbev/msad175.
3
Confusion will be my epitaph: genome-scale discordance stifles phylogenetic resolution of Holothuroidea.

本文引用的文献

1
Inferring ancient divergences requires genes with strong phylogenetic signals.推断古代分歧需要具有强烈系统发育信号的基因。
Nature. 2013 May 16;497(7449):327-31. doi: 10.1038/nature12130. Epub 2013 May 8.
2
Inferring hierarchical orthologous groups from orthologous gene pairs.从直系同源基因对推断层次同源物组。
PLoS One. 2013;8(1):e53786. doi: 10.1371/journal.pone.0053786. Epub 2013 Jan 14.
3
Genome-scale coestimation of species and gene trees.基因组规模的种系和基因树共估计。
困惑将是我的墓志铭:基因组尺度的不一致性抑制了海参纲的系统发育分辨率。
Proc Biol Sci. 2023 Jul 12;290(2002):20230988. doi: 10.1098/rspb.2023.0988.
4
PlantTribes2: Tools for comparative gene family analysis in plant genomics.植物部落2:植物基因组学中比较基因家族分析的工具
Front Plant Sci. 2023 Jan 31;13:1011199. doi: 10.3389/fpls.2022.1011199. eCollection 2022.
5
The evolution of ovary-biased gene expression in Hawaiian Drosophila.夏威夷果蝇中卵巢偏性基因表达的进化。
PLoS Genet. 2023 Jan 23;19(1):e1010607. doi: 10.1371/journal.pgen.1010607. eCollection 2023 Jan.
6
OrthoSNAP: A tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees.OrthoSNAP:一种从基因树中检索单拷贝直系同源基因的树分裂和修剪算法。
PLoS Biol. 2022 Oct 13;20(10):e3001827. doi: 10.1371/journal.pbio.3001827. eCollection 2022 Oct.
7
Positive selection and heat-response transcriptomes reveal adaptive features of the Brassicaceae desert model, Anastatica hierochuntica.正选择和热响应转录组揭示了十字花科沙漠模式植物——海乳草的适应性特征。
New Phytol. 2022 Nov;236(3):1006-1026. doi: 10.1111/nph.18411. Epub 2022 Aug 26.
8
Phylogenomic Analysis Reconstructed the Order Matoniales from Paleopolyploidy Veil.系统发育基因组学分析从古多倍体面纱中重建了婚姻目。
Plants (Basel). 2022 Jun 7;11(12):1529. doi: 10.3390/plants11121529.
9
Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference.利用所有基因家族极大地扩展了用于系统发育基因组推断的数据。
Mol Biol Evol. 2022 Jun 2;39(6). doi: 10.1093/molbev/msac112.
10
Phylogenomic analyses of echinoid diversification prompt a re-evaluation of their fossil record.系统发生基因组分析促使人们重新评估海胆类的化石记录。
Elife. 2022 Mar 22;11:e72460. doi: 10.7554/eLife.72460.
Genome Res. 2013 Feb;23(2):323-30. doi: 10.1101/gr.141978.112. Epub 2012 Nov 6.
4
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
5
Resolving the evolutionary relationships of molluscs with phylogenomic tools.利用系统基因组学工具解决软体动物的进化关系。
Nature. 2011 Oct 26;480(7377):364-7. doi: 10.1038/nature10526.
6
MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.MACSE:考虑移码和终止密码子的编码序列多重比对。
PLoS One. 2011;6(9):e22594. doi: 10.1371/journal.pone.0022594. Epub 2011 Sep 16.
7
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.RSEM:有或无参考基因组的 RNA-Seq 数据的准确转录本定量。
BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.
8
Full-length transcriptome assembly from RNA-Seq data without a reference genome.无参考基因组的 RNA-Seq 数据的全长转录组组装。
Nat Biotechnol. 2011 May 15;29(7):644-52. doi: 10.1038/nbt.1883.
9
DendroPy: a Python library for phylogenetic computing.DendroPy:一个用于系统发育计算的 Python 库。
Bioinformatics. 2010 Jun 15;26(12):1569-71. doi: 10.1093/bioinformatics/btq228. Epub 2010 Apr 25.
10
The dynamic genome of Hydra.水螅的动态基因组。
Nature. 2010 Mar 25;464(7288):592-6. doi: 10.1038/nature08830. Epub 2010 Mar 14.