Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.
Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.
Bioinformatics. 2019 Oct 1;35(19):3617-3627. doi: 10.1093/bioinformatics/btz171.
The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. This is done by searching for homologs within increasingly broad clades. The deepest clade that contains a homolog of the protein(s) encoded by a gene is that gene's phylostratum.
We have created a general R-based framework, phylostratr, to estimate the phylostratum of every gene in a species. The program fully automates analysis: selecting species for balanced representation, retrieving sequences, building databases, inferring phylostrata and returning diagnostics. Key diagnostics include: detection of genes with inferred homologs in old clades, but not intermediate ones; proteome quality assessments; false-positive diagnostics, and checks for missing organellar genomes. phylostratr allows extensive customization and systematic comparisons of the influence of analysis parameters or genomes on phylostrata inference. A user may: modify the automatically generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae.
Source code available at https://github.com/arendsee/phylostratr.
Supplementary data are available at Bioinformatics online.
系统发生地层学的目标是推断生物体中每个基因的进化起源。这是通过在越来越广泛的进化枝中搜索同源物来完成的。包含基因编码的蛋白质同源物的最深进化枝是该基因的地层。
我们创建了一个基于 R 的通用框架 phylostratr,用于估计物种中每个基因的地层。该程序完全自动化分析:选择具有平衡代表性的物种,检索序列,构建数据库,推断地层并返回诊断。关键诊断包括:检测到在旧进化枝中具有推断同源物的基因,但在中间进化枝中没有;蛋白质组质量评估;假阳性诊断以及检查是否缺少细胞器基因组。phylostratr 允许对分析参数或基因组对地层推断的影响进行广泛的自定义和系统比较。用户可以:修改自动生成的进化枝树,或使用自己的树;用来自 UniProt 的自动检索序列替换自定义序列;用替代算法替换 BLAST;或定制同源性推断分类器的方法和敏感性。我们通过拟南芥和酿酒酵母的案例研究展示了 phylostratr 的实用性。
源代码可在 https://github.com/arendsee/phylostratr 上获得。
补充数据可在生物信息学在线获得。