• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于图形的方法用于检测高度分化的重复蛋白家族中的序列同源性。

A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families.

作者信息

Wells Jonathan N, Marsh Joseph A

机构信息

MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK.

出版信息

Methods Mol Biol. 2019;1851:251-261. doi: 10.1007/978-1-4939-8736-8_13.

DOI:10.1007/978-1-4939-8736-8_13
PMID:30298401
Abstract

Reconstructing evolutionary relationships in repeat proteins is notoriously difficult due to the high degree of sequence divergence that typically occurs between duplicated repeats. This is complicated further by the fact that proteins with a large number of similar repeats are more likely to produce significant local sequence alignments than proteins with fewer copies of the repeat motif. Furthermore, biologically correct sequence alignments are sometimes impossible to achieve in cases where insertion or translocation events disrupt the order of repeats in one of the sequences being aligned. Combined, these attributes make traditional phylogenetic methods for studying protein families unreliable for repeat proteins, due to the dependence of such methods on accurate sequence alignment.We present here a practical solution to this problem, making use of graph clustering combined with the open-source software package HH-suite, which enables highly sensitive detection of sequence relationships. Carrying out multiple rounds of homology searches via alignment of profile hidden Markov models, large sets of related proteins are generated. By representing the relationships between proteins in these sets as graphs, subsequent clustering with the Markov cluster algorithm enables robust detection of repeat protein subfamilies.

摘要

由于重复序列之间通常存在高度的序列差异,重建重复蛋白中的进化关系非常困难。大量相似重复序列的蛋白质比重复基序拷贝数较少的蛋白质更有可能产生显著的局部序列比对,这使得情况更加复杂。此外,在插入或易位事件破坏了其中一个比对序列中重复序列顺序的情况下,有时无法实现生物学上正确的序列比对。综合起来,由于这些方法依赖于准确的序列比对,这些特性使得研究蛋白质家族的传统系统发育方法对重复蛋白不可靠。我们在此提出了一个解决该问题的实用方案,利用图聚类结合开源软件包HH-suite,它能够高度灵敏地检测序列关系。通过对轮廓隐马尔可夫模型进行比对来进行多轮同源性搜索,生成大量相关蛋白质。通过将这些蛋白质集合中的蛋白质之间的关系表示为图,随后使用马尔可夫聚类算法进行聚类,能够可靠地检测重复蛋白亚家族。

相似文献

1
A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families.一种基于图形的方法用于检测高度分化的重复蛋白家族中的序列同源性。
Methods Mol Biol. 2019;1851:251-261. doi: 10.1007/978-1-4939-8736-8_13.
2
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
3
Rapid and enhanced remote homology detection by cascading hidden Markov model searches in sequence space.通过在序列空间中级联隐马尔可夫模型搜索实现快速且增强的远程同源检测。
Bioinformatics. 2016 Feb 1;32(3):338-44. doi: 10.1093/bioinformatics/btv538. Epub 2015 Oct 10.
4
Hidden Markov Models for Protein Domain Homology Identification and Analysis.用于蛋白质结构域同源性鉴定与分析的隐马尔可夫模型
Methods Mol Biol. 2017;1555:47-58. doi: 10.1007/978-1-4939-6762-9_3.
5
Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability.通过类似蛋白质的人工序列填补蛋白质序列空间中的空白和稀疏区域,可以显著提高远程同源检测能力。
J Mol Biol. 2014 Feb 20;426(4):962-79. doi: 10.1016/j.jmb.2013.11.026. Epub 2013 Dec 4.
6
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.使用进化速率结合氨基酸替换矩阵进行稳健的序列比对。
BMC Bioinformatics. 2015 Aug 14;16:255. doi: 10.1186/s12859-015-0688-8.
7
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.
8
Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.蛋白质结构比对在用于结构预测的迭代隐马尔可夫模型协议中的应用。
BMC Bioinformatics. 2006 Sep 14;7:410. doi: 10.1186/1471-2105-7-410.
9
Tracking repeats using significance and transitivity.利用显著性和传递性追踪重复序列。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i311-7. doi: 10.1093/bioinformatics/bth911.
10
Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm.用于通过马尔可夫聚类算法进行同源性推断的基于BLAST的边加权指标评估。
BMC Bioinformatics. 2015 Jul 10;16:218. doi: 10.1186/s12859-015-0625-x.

引用本文的文献

1
On the possibility of yet a third kinetochore system in the protist phylum Euglenozoa.关于原生生物门眼虫纲中可能存在第三种动粒系统的探讨。
mBio. 2024 Dec 11;15(12):e0293624. doi: 10.1128/mbio.02936-24. Epub 2024 Oct 30.
2
A WDR35-dependent coat protein complex transports ciliary membrane cargo vesicles to cilia.WDR35 依赖性外衣蛋白复合物将纤毛膜货物囊泡运输至纤毛。
Elife. 2021 Nov 4;10:e69786. doi: 10.7554/eLife.69786.