• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Clustering Highly Divergent Homologous Proteins: An Alignment-Free Method.

作者信息

Muñoz-Baena Laura, Poon Art F Y

机构信息

Department of Microbiology and Immunology, Western University, London, Ontario, Canada.

Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada.

出版信息

Curr Protoc. 2023 Feb;3(2):e666. doi: 10.1002/cpz1.666.

DOI:10.1002/cpz1.666
PMID:36809686
Abstract

The comparative analysis of amino acid sequences is an important tool in molecular biology that often requires multiple sequence alignments. In comparisons between less closely related genomes, however, it becomes more difficult to accurately align protein-coding sequences, or even to identify homologous regions in different genomes. In this article, we describe an alignment-free method for the classification of homologous protein-coding regions from different genomes. This methodology was originally developed for comparing genomes within virus families, but may be adapted for other organisms. We quantify sequence homology from the overlap (intersection distance) of the k-mer (word) frequency distributions for different protein sequences. Next, we extract groups of homologous sequences from the resulting distance matrix using a combination of dimensionality reduction and hierarchical clustering methods. Finally, we demonstrate how to generate visualizations of the composition of clusters with respect to protein annotations, and by coloring protein-coding regions of genomes by cluster assignments. These provide a useful means to quickly assess the reliability of the clustering results based on the distribution of homologous genes among genomes. © 2023 Wiley Periodicals LLC. Basic Protocol 1: Data collection and processing Basic Protocol 2: Calculating k-mer distances Basic Protocol 3: Extracting clusters of homology Support Protocol: Genome plot based on clustering results.

摘要

相似文献

1
Clustering Highly Divergent Homologous Proteins: An Alignment-Free Method.
Curr Protoc. 2023 Feb;3(2):e666. doi: 10.1002/cpz1.666.
2
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.使用准比对快速发现和可视化 DNA 序列中的保守区域。
BMC Bioinformatics. 2013;14 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-14-S11-S2. Epub 2013 Sep 13.
3
Clustering evolving proteins into homologous families.将进化的蛋白质聚类为同源家族。
BMC Bioinformatics. 2013 Apr 8;14:120. doi: 10.1186/1471-2105-14-120.
4
High-quality sequence clustering guided by network topology and multiple alignment likelihood.网络拓扑和多重比对可能性引导的高质量序列聚类。
Bioinformatics. 2012 Apr 15;28(8):1078-85. doi: 10.1093/bioinformatics/bts098. Epub 2012 Feb 25.
5
MACHOS: Markov clusters of homologous subsequences.MACHOS:同源子序列的马尔可夫聚类
Bioinformatics. 2008 Jul 1;24(13):i77-85. doi: 10.1093/bioinformatics/btn144.
6
Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks.使用从序列相似性得分转换而来的新度量以及神经网络进行的序列比对来对蛋白质序列进行聚类。
BMC Bioinformatics. 2005 Oct 3;6:242. doi: 10.1186/1471-2105-6-242.
7
CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes.CGAT:一种用于在分析密切相关基因组之间复杂进化变化时可视化比对结果的比较基因组分析工具。
BMC Bioinformatics. 2006 Oct 24;7:472. doi: 10.1186/1471-2105-7-472.
8
Evaluation and improvements of clustering algorithms for detecting remote homologous protein families.用于检测远程同源蛋白家族的聚类算法的评估与改进
BMC Bioinformatics. 2015 Feb 5;16:34. doi: 10.1186/s12859-014-0445-4.
9
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
10
Comparative evaluation of word composition distances for the recognition of SCOP relationships.用于识别SCOP关系的词组成距离的比较评估。
Bioinformatics. 2004 Jan 22;20(2):206-15. doi: 10.1093/bioinformatics/btg392.