Department of Biological Sciences, Vanderbilt University , Nashville, TN 37235, USA.
G3 (Bethesda). 2021 Sep 6;11(9). doi: 10.1093/g3journal/jkab250.
Identification and retrieval of genes of interest from genomic data are an essential step for many bioinformatic applications. We present orthofisher, a command-line tool for automated identification and retrieval of genes with high sequence similarity to a query profile Hidden Markov Model sequence alignment across a set of proteomes. Performance assessment of orthofisher revealed high accuracy and precision during single-copy orthologous gene identification. orthofisher may be useful for assessing gene annotation quality, identifying single-copy orthologous genes for phylogenomic analyses, estimating gene copy number, and other evolutionary analyses that rely on identification and retrieval of homologous genes from genomic data. orthofisher comes complete with comprehensive documentation (https://jlsteenwyk.com/orthofisher/), is freely available under the MIT license, and is available for download from GitHub (https://github.com/JLSteenwyk/orthofisher), PyPi (https://pypi.org/project/orthofisher/), and the Anaconda Cloud (https://anaconda.org/jlsteenwyk/orthofisher).
从基因组数据中识别和提取感兴趣的基因是许多生物信息学应用的一个基本步骤。我们介绍了 orthofisher,这是一个命令行工具,用于自动识别和检索与查询轮廓 Hidden Markov Model 序列比对具有高度序列相似性的基因,这些比对跨越了一组蛋白质组。orthofisher 的性能评估显示,在单拷贝直系同源基因识别中具有很高的准确性和精度。orthofisher 可用于评估基因注释质量、识别用于系统发育分析的单拷贝直系同源基因、估计基因拷贝数以及其他依赖于从基因组数据中识别和检索同源基因的进化分析。orthofisher 随附有全面的文档(https://jlsteenwyk.com/orthofisher/),根据 MIT 许可证免费提供,并可从 GitHub(https://github.com/JLSteenwyk/orthofisher/)、PyPi(https://pypi.org/project/orthofisher/)和 Anaconda Cloud(https://anaconda.org/jlsteenwyk/orthofisher/)下载。