• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用级联 PSI-BLAST 提高远程同源物的检测:邻近蛋白质家族对序列覆盖度的影响。

Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.

机构信息

National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India.

出版信息

PLoS One. 2013;8(2):e56449. doi: 10.1371/journal.pone.0056449. Epub 2013 Feb 20.

DOI:10.1371/journal.pone.0056449
PMID:23437136
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3577913/
Abstract

BACKGROUND

Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST.

METHODOLOGY/PRINCIPAL FINDINGS: We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ~100% and Mathew's correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families.

CONCLUSIONS/SIGNIFICANCE: Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the 'bridging' role of related families.

摘要

背景

开发用于检测蛋白质超家族/折叠水平之间远距离关系的敏感序列搜索程序仍然是一个巨大的挑战。中间序列搜索方法是最常采用的有效识别远程同源物的方法。在这项研究中,我们使用来自拟南芥和水稻两个基因组的植物丝氨酸蛋白酶作为查询,对脯氨酰寡肽酶、类蛋白和枯草杆菌蛋白酶家族的丝氨酸蛋白酶进行了检查,并对 13 个其他不相关折叠家族进行了检查,以识别无法使用 PSI-BLAST 获得的远程同源物。

方法/主要发现:我们建议使用严格的方法(如级联 PSI-BLAST),从多个经典丝氨酸蛋白酶成员的查询开始,在家族中识别远程同源物。我们发现,经典的基于序列的方法,如 PSI-BLAST,在识别植物丝氨酸蛋白酶时显示出非常低的序列覆盖率。该算法应用于同源结构域的富集序列数据库,我们在家族水平获得了 88%的总体平均覆盖率,在超家族或折叠水平的覆盖率为 77%,特异性约为 100%,马修相关系数为 0.91。我们还对 SCOP 数据库中代表每一个结构类的 13 个其他蛋白质家族实施了类似的方法。使用像 Jackknifing 这样的统计测试的进一步调查,帮助我们更好地理解邻近蛋白质家族的影响。

结论/意义:我们的研究表明,对于级联 PSI-BLAST 搜索,使用家族的多个查询进行查询是有效的,即使在超家族水平也可以有效地预测远距离关系。我们提出了一种通用策略,使用多个查询序列来覆盖特定家族的所有远程成员。我们的发现表明,序列的预先选择作为查询以及邻近家族的存在对于在最小计算时间内有效地覆盖搜索空间是很重要的。这项研究还提供了对相关家族的“桥梁”作用的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/946d941b0a88/pone.0056449.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/62cfddc847f2/pone.0056449.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/055c29f56e0d/pone.0056449.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/1dc26821648b/pone.0056449.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/61aef45ab91b/pone.0056449.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/4e6ab7b9d011/pone.0056449.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/b4c6710ff3f6/pone.0056449.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/7c329cc613fa/pone.0056449.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/3533feb0f885/pone.0056449.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/946d941b0a88/pone.0056449.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/62cfddc847f2/pone.0056449.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/055c29f56e0d/pone.0056449.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/1dc26821648b/pone.0056449.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/61aef45ab91b/pone.0056449.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/4e6ab7b9d011/pone.0056449.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/b4c6710ff3f6/pone.0056449.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/7c329cc613fa/pone.0056449.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/3533feb0f885/pone.0056449.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee11/3577913/946d941b0a88/pone.0056449.g009.jpg

相似文献

1
Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.利用级联 PSI-BLAST 提高远程同源物的检测:邻近蛋白质家族对序列覆盖度的影响。
PLoS One. 2013;8(2):e56449. doi: 10.1371/journal.pone.0056449. Epub 2013 Feb 20.
2
Assessment of a rigorous transitive profile based search method to detect remotely similar proteins.一种用于检测远距离相似蛋白质的严格传递轮廓搜索方法的评估。
J Biomol Struct Dyn. 2005 Dec;23(3):283-98. doi: 10.1080/07391102.2005.10507066.
3
Cascade PSI-BLAST web server: a remote homology search tool for relating protein domains.级联PSI-BLAST网络服务器:一种用于关联蛋白质结构域的远程同源性搜索工具。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W143-6. doi: 10.1093/nar/gkl157.
4
Large-scale comparison of protein sequence alignment algorithms with structure alignments.蛋白质序列比对算法与结构比对的大规模比较。
Proteins. 2000 Jul 1;40(1):6-22. doi: 10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7.
5
Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability.通过类似蛋白质的人工序列填补蛋白质序列空间中的空白和稀疏区域,可以显著提高远程同源检测能力。
J Mol Biol. 2014 Feb 20;426(4):962-79. doi: 10.1016/j.jmb.2013.11.026. Epub 2013 Dec 4.
6
SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.SUPFAM——一个通过比较基于序列和基于结构的家族而得出的潜在蛋白质超家族关系数据库:对结构基因组学和基因组功能注释的意义。
Nucleic Acids Res. 2002 Jan 1;30(1):289-93. doi: 10.1093/nar/30.1.289.
7
Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases.用于蛋白质同源物的迭代序列/二级结构搜索:与氨基酸序列比对的比较及在基因组数据库中折叠识别的应用
Bioinformatics. 2000 Nov;16(11):988-1002. doi: 10.1093/bioinformatics/16.11.988.
8
Sequence clustering strategies improve remote homology recognitions while reducing search times.序列聚类策略在减少搜索时间的同时提高了远程同源性识别能力。
Protein Eng. 2002 Aug;15(8):643-9. doi: 10.1093/protein/15.8.643.
9
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.
10
Benchmarking PSI-BLAST in genome annotation.在基因组注释中对PSI-BLAST进行基准测试。
J Mol Biol. 1999 Nov 12;293(5):1257-71. doi: 10.1006/jmbi.1999.3233.

引用本文的文献

1
Exploration into the origins and mobilization of di-hydrofolate reductase genes and the emergence of clinical resistance to trimethoprim.探讨二氢叶酸还原酶基因的起源和动员,以及临床对甲氧苄啶耐药性的出现。
Microb Genom. 2020 Nov;6(11). doi: 10.1099/mgen.0.000440.
2
Identification of novel mazEF/pemIK family toxin-antitoxin loci and their distribution in the Staphylococcus genus.鉴定新型 mazEF/pemIK 家族毒素-抗毒素基因座及其在葡萄球菌属中的分布。
Sci Rep. 2017 Oct 18;7(1):13462. doi: 10.1038/s41598-017-13857-4.
3
Bioinformatics comparisons of RNA-binding proteins of pathogenic and non-pathogenic Escherichia coli strains reveal novel virulence factors.

本文引用的文献

1
Structural basis for Ca2+-independence and activation by homodimerization of tomato subtilase 3.番茄枯草杆菌蛋白酶 3 同源二聚化导致的 Ca2+-独立性和激活的结构基础。
Proc Natl Acad Sci U S A. 2009 Oct 6;106(40):17223-8. doi: 10.1073/pnas.0907587106. Epub 2009 Sep 23.
2
The protease-associated domain and C-terminal extension are required for zymogen processing, sorting within the secretory pathway, and activity of tomato subtilase 3 (SlSBT3).蛋白酶相关结构域和C末端延伸对于番茄枯草杆菌蛋白酶3(SlSBT3)的酶原加工、分泌途径中的分选以及活性是必需的。
J Biol Chem. 2009 May 22;284(21):14068-78. doi: 10.1074/jbc.M900370200. Epub 2009 Mar 30.
3
致病性和非致病性大肠杆菌菌株RNA结合蛋白的生物信息学比较揭示了新的毒力因子。
BMC Genomics. 2017 Aug 24;18(1):658. doi: 10.1186/s12864-017-4045-3.
4
NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection.NrichD数据库:富含通过计算设计的类蛋白质序列的序列数据库有助于远程同源性检测。
Nucleic Acids Res. 2015 Jan;43(Database issue):D300-5. doi: 10.1093/nar/gku888. Epub 2014 Sep 27.
5
Evolution and structural organization of the C proteins of paramyxovirinae.副黏病毒科 C 蛋白的进化和结构组织。
PLoS One. 2014 Feb 25;9(2):e90003. doi: 10.1371/journal.pone.0090003. eCollection 2014.
6
Powerful sequence similarity search methods and in-depth manual analyses can identify remote homologs in many apparently "orphan" viral proteins.强大的序列相似性搜索方法和深入的人工分析能够在许多看似“孤立”的病毒蛋白中识别出远源同源物。
J Virol. 2014 Jan;88(1):10-20. doi: 10.1128/JVI.02595-13. Epub 2013 Oct 23.
The role of L1 loop in the mechanism of rhomboid intramembrane protease GlpG.
L1环在菱形膜内蛋白酶GlpG作用机制中的作用。
J Mol Biol. 2007 Dec 7;374(4):1104-13. doi: 10.1016/j.jmb.2007.10.014. Epub 2007 Oct 11.
4
Strategies for the effective identification of remotely related sequences in multiple PSSM search approach.在多重位置特异性得分矩阵(PSSM)搜索方法中有效识别远距离相关序列的策略。
Proteins. 2007 Jun 1;67(4):789-94. doi: 10.1002/prot.21356.
5
Motif kernel generated by genetic programming improves remote homology and fold detection.通过遗传编程生成的基序核可改善远程同源性和折叠检测。
BMC Bioinformatics. 2007 Jan 25;8:23. doi: 10.1186/1471-2105-8-23.
6
The TIGR Rice Genome Annotation Resource: improvements and new features.TIGR水稻基因组注释资源:改进与新特性
Nucleic Acids Res. 2007 Jan;35(Database issue):D883-7. doi: 10.1093/nar/gkl976. Epub 2006 Dec 1.
7
CDD: a conserved domain database for interactive domain family analysis.CDD:用于交互式结构域家族分析的保守结构域数据库。
Nucleic Acids Res. 2007 Jan;35(Database issue):D237-40. doi: 10.1093/nar/gkl951. Epub 2006 Nov 29.
8
The WWWH of remote homolog detection: the state of the art.远程同源物检测的“WWWH”:当前技术水平。
Brief Bioinform. 2007 Mar;8(2):78-87. doi: 10.1093/bib/bbl032. Epub 2006 Sep 26.
9
Cross genome comparisons of serine proteases in Arabidopsis and rice.拟南芥和水稻中丝氨酸蛋白酶的全基因组比较
BMC Genomics. 2006 Aug 9;7:200. doi: 10.1186/1471-2164-7-200.
10
A machine learning information retrieval approach to protein fold recognition.一种用于蛋白质折叠识别的机器学习信息检索方法。
Bioinformatics. 2006 Jun 15;22(12):1456-63. doi: 10.1093/bioinformatics/btl102. Epub 2006 Mar 17.