Suppr超能文献

phylostratr:一种系统发生地层学框架。

phylostratr: a framework for phylostratigraphy.

机构信息

Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.

Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.

出版信息

Bioinformatics. 2019 Oct 1;35(19):3617-3627. doi: 10.1093/bioinformatics/btz171.

Abstract

MOTIVATION

The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. This is done by searching for homologs within increasingly broad clades. The deepest clade that contains a homolog of the protein(s) encoded by a gene is that gene's phylostratum.

RESULTS

We have created a general R-based framework, phylostratr, to estimate the phylostratum of every gene in a species. The program fully automates analysis: selecting species for balanced representation, retrieving sequences, building databases, inferring phylostrata and returning diagnostics. Key diagnostics include: detection of genes with inferred homologs in old clades, but not intermediate ones; proteome quality assessments; false-positive diagnostics, and checks for missing organellar genomes. phylostratr allows extensive customization and systematic comparisons of the influence of analysis parameters or genomes on phylostrata inference. A user may: modify the automatically generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae.

AVAILABILITY AND IMPLEMENTATION

Source code available at https://github.com/arendsee/phylostratr.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

系统发生地层学的目标是推断生物体中每个基因的进化起源。这是通过在越来越广泛的进化枝中搜索同源物来完成的。包含基因编码的蛋白质同源物的最深进化枝是该基因的地层。

结果

我们创建了一个基于 R 的通用框架 phylostratr,用于估计物种中每个基因的地层。该程序完全自动化分析:选择具有平衡代表性的物种,检索序列,构建数据库,推断地层并返回诊断。关键诊断包括:检测到在旧进化枝中具有推断同源物的基因,但在中间进化枝中没有;蛋白质组质量评估;假阳性诊断以及检查是否缺少细胞器基因组。phylostratr 允许对分析参数或基因组对地层推断的影响进行广泛的自定义和系统比较。用户可以:修改自动生成的进化枝树,或使用自己的树;用来自 UniProt 的自动检索序列替换自定义序列;用替代算法替换 BLAST;或定制同源性推断分类器的方法和敏感性。我们通过拟南芥和酿酒酵母的案例研究展示了 phylostratr 的实用性。

可用性和实现

源代码可在 https://github.com/arendsee/phylostratr 上获得。

补充信息

补充数据可在生物信息学在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验