Suppr超能文献

KinFin:用于聚类蛋白质序列的分类群感知分析的软件。

KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

作者信息

Laetsch Dominik R, Blaxter Mark L

机构信息

Institute of Evolutionary Biology, University of Edinburgh, EH9 3JT, United Kingdom

The James Hutton Institute, DD2 5DA Dundee, United Kingdom.

出版信息

G3 (Bethesda). 2017 Oct 5;7(10):3349-3357. doi: 10.1534/g3.117.300233.

Abstract

The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data.

摘要

比较基因组学领域关注的是对生物体基因组中编码信息的异同进行研究。一种常见的方法是通过基于序列相似性对蛋白质序列进行聚类来定义基因家族,并分析不同物种组中蛋白质簇的存在与否,以此作为生物学研究的指导。由于这些数据的维度很高,对从大量物种或具有许多基因的物种中推断出的蛋白质簇进行下游分析并非易事,而且对于透明、可重复和可定制的分析,几乎没有解决方案。我们提出了KinFin,这是一种简化的软件解决方案,能够整合来自常见文件格式的数据,并提供蛋白质簇的聚合注释。KinFin基于所分析物种的系统分类法,或基于用户定义的分类群分组(例如,基于生活史特征、生物体表型或相互竞争的系统发育假说等属性的集合)进行分析。结果通过图形化和详细的文本输出文件报告。我们通过解决有关丝虫线虫生物学的问题来说明KinFin流程的实用性,丝虫线虫包括具有兽医和医学重要性的寄生虫。我们解析了这些物种之间的系统发育关系,并探索关键谱系以及自定义分类群集之间簇中蛋白质的功能注释,识别出感兴趣的基因家族。KinFin可以轻松地集成到现有的比较基因组工作流程中,并促进对聚类蛋白质数据的透明和可重复分析。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验