Suppr超能文献

蛋白质家族特征的本质:来自位置特异性评分矩阵奇异值分析的见解。

Nature of protein family signatures: insights from singular value analysis of position-specific scoring matrices.

作者信息

Kinjo Akira R, Nakamura Haruki

机构信息

Institute for Protein Research, Osaka University, Suita, Osaka, Japan.

出版信息

PLoS One. 2008 Apr 9;3(4):e1963. doi: 10.1371/journal.pone.0001963.

Abstract

Position-specific scoring matrices (PSSMs) are useful for detecting weak homology in protein sequence analysis, and they are thought to contain some essential signatures of the protein families. In order to elucidate what kind of ingredients constitute such family-specific signatures, we apply singular value decomposition to a set of PSSMs and examine the properties of dominant right and left singular vectors. The first right singular vectors were correlated with various amino acid indices including relative mutability, amino acid composition in protein interior, hydropathy, or turn propensity, depending on proteins. A significant correlation between the first left singular vector and a measure of site conservation was observed. It is shown that the contribution of the first singular component to the PSSMs act to disfavor potentially but falsely functionally important residues at conserved sites. The second right singular vectors were highly correlated with hydrophobicity scales, and the corresponding left singular vectors with contact numbers of protein structures. It is suggested that sequence alignment with a PSSM is essentially equivalent to threading supplemented with functional information. In addition, singular vectors may be useful for analyzing and annotating the characteristics of conserved sites in protein families.

摘要

位置特异性得分矩阵(PSSM)在蛋白质序列分析中对于检测弱同源性很有用,并且人们认为它们包含了蛋白质家族的一些基本特征。为了阐明是何种成分构成了这种家族特异性特征,我们对一组PSSM应用奇异值分解,并研究主导右奇异向量和左奇异向量的性质。根据蛋白质的不同,第一个右奇异向量与各种氨基酸指标相关,包括相对变异性、蛋白质内部的氨基酸组成、亲水性或转角倾向。观察到第一个左奇异向量与位点保守性的一种度量之间存在显著相关性。结果表明,第一个奇异分量对PSSM的贡献在于不利于保守位点上潜在但错误地具有功能重要性的残基。第二个右奇异向量与疏水性标度高度相关,相应的左奇异向量与蛋白质结构的接触数相关。有人提出,用PSSM进行序列比对本质上等同于补充了功能信息的穿线法。此外,奇异向量可能有助于分析和注释蛋白质家族中保守位点的特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/140f/2276316/3831f323a8eb/pone.0001963.g001.jpg

相似文献

2
Analysis of evolutionary conservation patterns and their influence on identifying protein functional sites.
J Bioinform Comput Biol. 2014 Oct;12(5):1440003. doi: 10.1142/S0219720014400034.
4
PSSM-based prediction of DNA binding sites in proteins.
BMC Bioinformatics. 2005 Feb 19;6:33. doi: 10.1186/1471-2105-6-33.
5
Eigenvalue analysis of amino acid substitution matrices reveals a sharp transition of the mode of sequence conservation in proteins.
Bioinformatics. 2004 Nov 1;20(16):2504-8. doi: 10.1093/bioinformatics/bth297. Epub 2004 May 6.
6
MulPSSM: a database of multiple position-specific scoring matrices of protein domain families.
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D243-6. doi: 10.1093/nar/gkj043.
9
Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information.
PLoS Comput Biol. 2008 Sep 26;4(9):e1000181. doi: 10.1371/journal.pcbi.1000181.
10
Normalized feature vectors: a novel alignment-free sequence comparison method based on the numbers of adjacent amino acids.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):457-67. doi: 10.1109/TCBB.2013.10.

引用本文的文献

1
Computational biophysics and structural biology of proteins-a Special Issue in honor of Prof. Haruki Nakamura's 70th birthday.
Biophys Rev. 2023 Jan 3;14(6):1211-1222. doi: 10.1007/s12551-022-01039-0. eCollection 2022 Dec.
2
A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions.
Biophys Physicobiol. 2016 Apr 22;13:45-62. doi: 10.2142/biophysico.13.0_45. eCollection 2016.
3
Profile conditional random fields for modeling protein families with structural information.
Biophysics (Nagoya-shi). 2009 May 30;5:37-44. doi: 10.2142/biophysics.5.37. eCollection 2009.
4
Specific non-local interactions are not necessary for recovering native protein dynamics.
PLoS One. 2014 Mar 13;9(3):e91347. doi: 10.1371/journal.pone.0091347. eCollection 2014.
5
Cooperativity among short amyloid stretches in long amyloidogenic sequences.
PLoS One. 2012;7(6):e39369. doi: 10.1371/journal.pone.0039369. Epub 2012 Jun 22.
7
Prodepth: predict residue depth by support vector regression approach from protein sequences only.
PLoS One. 2009 Sep 17;4(9):e7072. doi: 10.1371/journal.pone.0007072.

本文引用的文献

2
Ideal amino acid exchange forms for approximating substitution matrices.
Proteins. 2007 Nov 1;69(2):379-93. doi: 10.1002/prot.21509.
3
UniRef: comprehensive and non-redundant UniProt reference clusters.
Bioinformatics. 2007 May 15;23(10):1282-8. doi: 10.1093/bioinformatics/btm098. Epub 2007 Mar 22.
5
6
Pfam: clans, web tools and services.
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D247-51. doi: 10.1093/nar/gkj149.
9
Structural divergence and distant relationships in proteins: evolution of the globins.
Curr Opin Struct Biol. 2005 Jun;15(3):290-301. doi: 10.1016/j.sbi.2005.05.008.
10
Recoverable one-dimensional encoding of three-dimensional protein structures.
Bioinformatics. 2005 May 15;21(10):2167-70. doi: 10.1093/bioinformatics/bti330. Epub 2005 Feb 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验