Suppr超能文献

使用一组合适的同源序列进行功能区域预测——一种通过将结构和序列信息与空间统计相结合来选择序列的指标。

Functional region prediction with a set of appropriate homologous sequences--an index for sequence selection by integrating structure and sequence information with spatial statistics.

作者信息

Nemoto Wataru, Toh Hiroyuki

机构信息

Computational Biology Research Center (CBRC), Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.

出版信息

BMC Struct Biol. 2012 May 29;12:11. doi: 10.1186/1472-6807-12-11.

Abstract

BACKGROUND

The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions.

RESULTS

We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods.

CONCLUSIONS

Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems.

摘要

背景

检测蛋白质结构上的保守残基簇是预测功能蛋白区域的有效策略之一。基于此策略已开发出多种方法,如进化追踪法。在这类方法中,通过比较同源氨基酸序列来识别保守残基。因此,同源序列的选择是关键步骤。根据经验可知,为识别保守残基,同源序列集合中需要一定程度的序列差异。然而,尚未充分解决开发一种选择适合保守残基识别的同源序列的方法这一问题。为高效预测功能区域,需要一种客观通用的方法来选择合适的同源序列。

结果

我们开发了一种新指标来选择适合保守残基识别的序列,并将该指标应用于我们预测蛋白质功能区域的方法中。该指标的应用提高了功能区域预测的性能。该指标表示蛋白质三级结构上保守残基的聚集程度。为此,通过应用空间统计学将结构和序列信息整合到该指标中。空间统计学是统计学的一个领域,其中不仅考虑数据的属性,还同时考虑数据的几何坐标。更高程度的聚集会产生更大的指标分数。我们采用指标分数最高的同源序列集合,假定聚集程度最大时预测准确率最高。该指标选择的序列集比其他基于序列的方法选择的序列集具有更高的功能区域预测性能。

结论

该指标能自动且客观地选择合适的同源序列。这种序列选择提高了功能区域预测的性能。据我们所知,这是首次将空间统计学应用于蛋白质分析的方法。这种结构和序列信息的整合对其他生物信息学问题也会有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b24/3533907/101163a43e41/1472-6807-12-11-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验