对序列组进行监督多变量分析以鉴定特异性决定残基。

Supervised multivariate analysis of sequence groups to identify specificity determining residues.

作者信息

Wallace Iain M, Higgins Desmond G

机构信息

The Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin, Ireland.

出版信息

BMC Bioinformatics. 2007 Apr 23;8:135. doi: 10.1186/1471-2105-8-135.

DOI:10.1186/1471-2105-8-135

PMID:17451607

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1878507/

Abstract

BACKGROUND

Proteins that evolve from a common ancestor can change functionality over time, and it is important to be able identify residues that cause this change. In this paper we show how a supervised multivariate statistical method, Between Group Analysis (BGA), can be used to identify these residues from families of proteins with different substrate specifities using multiple sequence alignments.

RESULTS

We demonstrate the usefulness of this method on three different test cases. Two of these test cases, the Lactate/Malate dehydrogenase family and Nucleotidyl Cyclases, consist of two functional groups. The other family, Serine Proteases consists of three groups. BGA was used to analyse and visualise these three families using two different encoding schemes for the amino acids.

CONCLUSION

This overall combination of methods in this paper is powerful and flexible while being computationally very fast and simple. BGA is especially useful because it can be used to analyse any number of functional classes. In the examples we used in this paper, we have only used 2 or 3 classes for demonstration purposes but any number can be used and visualised.

摘要

背景

从共同祖先演化而来的蛋白质会随着时间改变功能，识别导致这种变化的残基很重要。在本文中，我们展示了一种监督多元统计方法——组间分析（BGA），如何用于通过多序列比对从具有不同底物特异性的蛋白质家族中识别这些残基。

结果

我们在三个不同的测试案例中证明了该方法的有效性。其中两个测试案例，乳酸/苹果酸脱氢酶家族和核苷酸环化酶，由两个功能组组成。另一个家族，丝氨酸蛋白酶，由三个组组成。使用两种不同的氨基酸编码方案，BGA用于分析和可视化这三个家族。

结论

本文中这些方法的整体组合强大且灵活，同时计算速度非常快且简单。BGA特别有用，因为它可用于分析任意数量的功能类别。在本文使用的示例中，我们仅使用2或3个类别用于演示目的，但可以使用和可视化任意数量的类别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6735/1878507/51c84ec2f0cb/1471-2105-8-135-1.jpg

相似文献

Supervised multivariate analysis of sequence groups to identify specificity determining residues.对序列组进行监督多变量分析以鉴定特异性决定残基。

BMC Bioinformatics. 2007 Apr 23;8:135. doi: 10.1186/1471-2105-8-135.

Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms.优化序列轮廓的大小，以提高由轮廓-轮廓算法生成的蛋白质序列比对的准确性。

Bioinformatics. 2008 May 1;24(9):1145-53. doi: 10.1093/bioinformatics/btn097. Epub 2008 Mar 12.

Protein structure mining using a structural alphabet.使用结构字母表进行蛋白质结构挖掘。

Proteins. 2008 May 1;71(2):920-37. doi: 10.1002/prot.21776.

BiasViz: visualization of amino acid biased regions in protein alignments.偏差可视化工具（BiasViz）：蛋白质比对中氨基酸偏差区域的可视化。

Bioinformatics. 2007 Nov 15;23(22):3093-4. doi: 10.1093/bioinformatics/btm489. Epub 2007 Oct 6.

A simple genetic algorithm for multiple sequence alignment.一种用于多序列比对的简单遗传算法。

Genet Mol Res. 2007 Oct 5;6(4):964-82.

SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.SVM-HUSTLE——一种用于成对蛋白质远程同源性检测的迭代半监督机器学习方法。

Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.使用多序列特征向量和二级结构从蛋白质序列预测二硫键连接性。

Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.

The global trace graph, a novel paradigm for searching protein sequence databases.全局追踪图，一种搜索蛋白质序列数据库的新范式。

Bioinformatics. 2007 Sep 15;23(18):2361-7. doi: 10.1093/bioinformatics/btm358. Epub 2007 Sep 6.

HMM-Kalign: a tool for generating sub-optimal HMM alignments.HMM-Kalign：一种用于生成次优隐马尔可夫模型比对的工具。

Bioinformatics. 2007 Nov 15;23(22):3095-7. doi: 10.1093/bioinformatics/btm492. Epub 2007 Oct 6.

Efficient functional clustering of protein sequences using the Dirichlet process.使用狄利克雷过程对蛋白质序列进行高效功能聚类。

Bioinformatics. 2008 Aug 15;24(16):1765-71. doi: 10.1093/bioinformatics/btn244. Epub 2008 May 29.

引用本文的文献

Principal Component Analysis Applications in COVID-19 Genome Sequence Studies.主成分分析在新冠病毒基因组序列研究中的应用

Cognit Comput. 2021 Jan 13:1-12. doi: 10.1007/s12559-020-09790-w.

Recognition of sites of functional specialisation in all known eukaryotic protein kinase families.识别所有已知真核蛋白激酶家族中功能特化的位点。

PLoS Comput Biol. 2018 Feb 13;14(2):e1005975. doi: 10.1371/journal.pcbi.1005975. eCollection 2018 Feb.

ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.ALVIS：多序列比对的交互式非聚合可视化与探索性分析

Nucleic Acids Res. 2016 May 5;44(8):e77. doi: 10.1093/nar/gkw022. Epub 2016 Jan 26.

Principal components analysis of protein sequence clusters.蛋白质序列簇的主成分分析。

J Struct Funct Genomics. 2014 Mar;15(1):1-11. doi: 10.1007/s10969-014-9173-2. Epub 2014 Feb 5.

Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families.决定因素、判别式、保守残基——一种检测蛋白质家族功能分歧的启发式方法。

PLoS One. 2011;6(9):e24382. doi: 10.1371/journal.pone.0024382. Epub 2011 Sep 12.

Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets.使用简化氨基酸字母表的相对复杂度度量对蛋白质家族进行功能亚型聚类。

BMC Bioinformatics. 2010 Aug 18;11:428. doi: 10.1186/1471-2105-11-428.

Multi-Harmony: detecting functional specificity from sequence alignment.多和谐：从序列比对中检测功能特异性。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W35-40. doi: 10.1093/nar/gkq415. Epub 2010 Jun 4.

Clustering of protein domains for functional and evolutionary studies.蛋白质结构域聚类在功能和进化研究中的应用。

BMC Bioinformatics. 2009 Oct 15;10:335. doi: 10.1186/1471-2105-10-335.

Ensemble approach to predict specificity determinants: benchmarking and validation.预测特异性决定因素的集成方法：基准测试与验证

BMC Bioinformatics. 2009 Jul 2;10:207. doi: 10.1186/1471-2105-10-207.

Combining specificity determining and conserved residues improves functional site prediction.结合特异性决定残基和保守残基可改善功能位点预测。

BMC Bioinformatics. 2009 Jun 9;10:174. doi: 10.1186/1471-2105-10-174.

本文引用的文献

Sequence comparison by sequence harmony identifies subtype-specific functional sites.通过序列协调性进行序列比较可识别亚型特异性功能位点。

Nucleic Acids Res. 2006;34(22):6540-8. doi: 10.1093/nar/gkl901. Epub 2006 Nov 27.

Rank information: a structure-independent measure of evolutionary trace quality that improves identification of protein functional sites.排名信息：一种与结构无关的进化踪迹质量度量，可改进蛋白质功能位点的识别。

Proteins. 2006 Oct 1;65(1):111-23. doi: 10.1002/prot.21101.

TreeDet: a web server to explore sequence space.TreeDet：一个用于探索序列空间的网络服务器。

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W110-5. doi: 10.1093/nar/gkl203.

ET viewer: an application for predicting and visualizing functional sites in protein structures.ET查看器：一种用于预测和可视化蛋白质结构中功能位点的应用程序。

Bioinformatics. 2006 Aug 15;22(16):2049-50. doi: 10.1093/bioinformatics/btl285. Epub 2006 Jun 29.

Subfamily logos: visualization of sequence deviations at alignment positions with high information content.亚家族序列标识：高信息含量比对位置处序列偏差的可视化呈现。

BMC Bioinformatics. 2006 Jun 21;7:313. doi: 10.1186/1471-2105-7-313.

Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments.双样本标识：两组序列比对之间差异的图形表示。

Bioinformatics. 2006 Jun 15;22(12):1536-7. doi: 10.1093/bioinformatics/btl151. Epub 2006 Apr 21.

Phylogeny-independent detection of functional residues.功能残基的系统发育无关检测

Bioinformatics. 2006 Jun 15;22(12):1440-8. doi: 10.1093/bioinformatics/btl104. Epub 2006 Mar 21.

Pfam: clans, web tools and services.蛋白质家族数据库（Pfam）：家族分类、网络工具及服务

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D247-51. doi: 10.1093/nar/gkj149.

Linking enzyme sequence to function using Conserved Property Difference Locator to identify and annotate positions likely to control specific functionality.使用保守属性差异定位器将酶序列与功能联系起来，以识别和注释可能控制特定功能的位置。

BMC Bioinformatics. 2005 Nov 30;6:284. doi: 10.1186/1471-2105-6-284.

BADASP: predicting functional specificity in protein families using ancestral sequences.BADASP：利用祖先序列预测蛋白质家族中的功能特异性

Bioinformatics. 2005 Nov 15;21(22):4190-1. doi: 10.1093/bioinformatics/bti678. Epub 2005 Sep 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对序列组进行监督多变量分析以鉴定特异性决定残基。

Supervised multivariate analysis of sequence groups to identify specificity determining residues.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献