使用一组合适的同源序列进行功能区域预测——一种通过将结构和序列信息与空间统计相结合来选择序列的指标。

Functional region prediction with a set of appropriate homologous sequences--an index for sequence selection by integrating structure and sequence information with spatial statistics.

作者信息

Nemoto Wataru, Toh Hiroyuki

机构信息

Computational Biology Research Center (CBRC), Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.

出版信息

BMC Struct Biol. 2012 May 29;12:11. doi: 10.1186/1472-6807-12-11.

DOI:10.1186/1472-6807-12-11

PMID:22643026

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3533907/

Abstract

BACKGROUND

The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions.

RESULTS

We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods.

CONCLUSIONS

Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems.

摘要

背景

检测蛋白质结构上的保守残基簇是预测功能蛋白区域的有效策略之一。基于此策略已开发出多种方法，如进化追踪法。在这类方法中，通过比较同源氨基酸序列来识别保守残基。因此，同源序列的选择是关键步骤。根据经验可知，为识别保守残基，同源序列集合中需要一定程度的序列差异。然而，尚未充分解决开发一种选择适合保守残基识别的同源序列的方法这一问题。为高效预测功能区域，需要一种客观通用的方法来选择合适的同源序列。

结果

我们开发了一种新指标来选择适合保守残基识别的序列，并将该指标应用于我们预测蛋白质功能区域的方法中。该指标的应用提高了功能区域预测的性能。该指标表示蛋白质三级结构上保守残基的聚集程度。为此，通过应用空间统计学将结构和序列信息整合到该指标中。空间统计学是统计学的一个领域，其中不仅考虑数据的属性，还同时考虑数据的几何坐标。更高程度的聚集会产生更大的指标分数。我们采用指标分数最高的同源序列集合，假定聚集程度最大时预测准确率最高。该指标选择的序列集比其他基于序列的方法选择的序列集具有更高的功能区域预测性能。

结论

该指标能自动且客观地选择合适的同源序列。这种序列选择提高了功能区域预测的性能。据我们所知，这是首次将空间统计学应用于蛋白质分析的方法。这种结构和序列信息的整合对其他生物信息学问题也会有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b24/3533907/101163a43e41/1472-6807-12-11-1.jpg

相似文献

Functional region prediction with a set of appropriate homologous sequences--an index for sequence selection by integrating structure and sequence information with spatial statistics.

BMC Struct Biol. 2012 May 29;12:11. doi: 10.1186/1472-6807-12-11.

Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling.

PLoS Comput Biol. 2009 Jan;5(1):e1000267. doi: 10.1371/journal.pcbi.1000267. Epub 2009 Jan 23.

Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins.

J Mol Biol. 2001 Apr 13;307(5):1487-502. doi: 10.1006/jmbi.2001.4540.

Blast sampling for structural and functional analyses.

BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62.

Sequence-based enzyme catalytic domain prediction using clustering and aggregated mutual information content.

J Bioinform Comput Biol. 2011 Oct;9(5):597-611. doi: 10.1142/s0219720011005677.

Predicting ligand binding residues and functional sites using multipositional correlations with graph theoretic clustering and kernel CCA.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):992-1001. doi: 10.1109/TCBB.2011.136.

Prediction of protein subcellular localization.

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

Sequence patterns derived from the automated prediction of functional residues in structurally-aligned homologous protein families.

Bioinformatics. 2004 Oct 12;20(15):2380-9. doi: 10.1093/bioinformatics/bth255. Epub 2004 Apr 8.

Improving protein secondary structure prediction based on short subsequences with local structure similarity.

BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2164-11-S4-S4.

Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm.

BMC Bioinformatics. 2015 Jul 10;16:218. doi: 10.1186/s12859-015-0625-x.

引用本文的文献

Interface Prediction for GPCR Oligomerization Between Transmembrane Helices.

Methods Mol Biol. 2021;2315:99-110. doi: 10.1007/978-1-0716-1468-6_6.

Accurate Classification of Biological and non-Biological Interfaces in Protein Crystal Structures using Subtle Covariation Signals.

Sci Rep. 2019 Aug 30;9(1):12603. doi: 10.1038/s41598-019-48913-8.

Protein ligand-specific binding residue predictions by an ensemble classifier.

BMC Bioinformatics. 2016 Nov 17;17(1):470. doi: 10.1186/s12859-016-1348-3.

Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions.

Comput Struct Biotechnol J. 2013 Dec 5;8:e201308007. doi: 10.5936/csbj.201308007. eCollection 2013.

Structure-based Methods for Computational Protein Functional Site Prediction.

Comput Struct Biotechnol J. 2013 Nov 11;8:e201308005. doi: 10.5936/csbj.201308005. eCollection 2013.

本文引用的文献

An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies.

Algorithms Mol Biol. 2010 Jul 15;5:29. doi: 10.1186/1748-7188-5-29.

Lysozymes in the animal kingdom.

J Biosci. 2010 Mar;35(1):127-60. doi: 10.1007/s12038-010-0015-5.

Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure.

PLoS Comput Biol. 2009 Dec;5(12):e1000585. doi: 10.1371/journal.pcbi.1000585. Epub 2009 Dec 4.

Detection of functionally important regions in "hypothetical proteins" of known structure.

Structure. 2008 Dec 10;16(12):1755-63. doi: 10.1016/j.str.2008.10.017.

Recent developments in the MAFFT multiple sequence alignment program.

Brief Bioinform. 2008 Jul;9(4):286-98. doi: 10.1093/bib/bbn013. Epub 2008 Mar 27.

NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5. doi: 10.1093/nar/gkl842. Epub 2006 Nov 27.

3D complex: a structural classification of protein complexes.

PLoS Comput Biol. 2006 Nov 17;2(11):e155. doi: 10.1371/journal.pcbi.0020155. Epub 2006 Oct 5.

ProtBuD: a database of biological unit structures of protein families and superfamilies.

Bioinformatics. 2006 Dec 1;22(23):2876-82. doi: 10.1093/bioinformatics/btl490. Epub 2006 Oct 2.

Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties.

BMC Bioinformatics. 2006 Jun 21;7:312. doi: 10.1186/1471-2105-7-312.

Evolutionary and structural feedback on selection of sequences for comparative analysis of proteins.

Proteins. 2006 Apr 1;63(1):87-99. doi: 10.1002/prot.20866.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用一组合适的同源序列进行功能区域预测——一种通过将结构和序列信息与空间统计相结合来选择序列的指标。

Functional region prediction with a set of appropriate homologous sequences--an index for sequence selection by integrating structure and sequence information with spatial statistics.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献