Suppr超能文献

Clustering Highly Divergent Homologous Proteins: An Alignment-Free Method.

作者信息

Muñoz-Baena Laura, Poon Art F Y

机构信息

Department of Microbiology and Immunology, Western University, London, Ontario, Canada.

Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada.

出版信息

Curr Protoc. 2023 Feb;3(2):e666. doi: 10.1002/cpz1.666.

Abstract

The comparative analysis of amino acid sequences is an important tool in molecular biology that often requires multiple sequence alignments. In comparisons between less closely related genomes, however, it becomes more difficult to accurately align protein-coding sequences, or even to identify homologous regions in different genomes. In this article, we describe an alignment-free method for the classification of homologous protein-coding regions from different genomes. This methodology was originally developed for comparing genomes within virus families, but may be adapted for other organisms. We quantify sequence homology from the overlap (intersection distance) of the k-mer (word) frequency distributions for different protein sequences. Next, we extract groups of homologous sequences from the resulting distance matrix using a combination of dimensionality reduction and hierarchical clustering methods. Finally, we demonstrate how to generate visualizations of the composition of clusters with respect to protein annotations, and by coloring protein-coding regions of genomes by cluster assignments. These provide a useful means to quickly assess the reliability of the clustering results based on the distribution of homologous genes among genomes. © 2023 Wiley Periodicals LLC. Basic Protocol 1: Data collection and processing Basic Protocol 2: Calculating k-mer distances Basic Protocol 3: Extracting clusters of homology Support Protocol: Genome plot based on clustering results.

摘要

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验