Suppr超能文献

使用适用于异构数据集的自动算法预测功能位点。

Predicting functional sites with an automated algorithm suitable for heterogeneous datasets.

作者信息

La David, Livesay Dennis R

机构信息

Department of Biological Sciences, California State Polytechnic University, Pomona, California 91768, USA.

出版信息

BMC Bioinformatics. 2005 May 13;6:116. doi: 10.1186/1471-2105-6-116.

Abstract

BACKGROUND

In a previous report (La et al., Proteins, 2005), we have demonstrated that the identification of phylogenetic motifs, protein sequence fragments conserving the overall familial phylogeny, represent a promising approach for sequence/function annotation. Across a structurally and functionally heterogeneous dataset, phylogenetic motifs have been demonstrated to correspond to a wide variety of functional site archetypes, including those defined by surface loops, active site clefts, and less exposed regions. However, in our original demonstration of the technique, phylogenetic motif identification is dependent upon a manually determined similarity threshold, prohibiting large-scale application of the technique.

RESULTS

In this report, we present an algorithmic approach that determines thresholds without human subjectivity. The approach relies on significant raw data preprocessing to improve signal detection. Subsequently, Partition Around Medoids Clustering (PAMC) of the similarity scores assesses sequence fragments where functional annotation remains in question. The accuracy of the approach is confirmed through comparisons to our previous (manual) results and structural analyses. Triosephosphate isomerase and arginyl-tRNA synthetase are discussed as exemplar cases. A quantitative functional site prediction assessment algorithm indicates that the phylogenetic motif predictions, which require sequence information only, are nearly as good as those from evolutionary trace methods that do incorporate structure.

CONCLUSION

The automated threshold detection algorithm has been incorporated into MINER, our web-based phylogenetic motif identification server. MINER is freely available on the web at http://www.pmap.csupomona.edu/MINER/. Pre-calculated functional site predictions of the COG database and an implementation of the threshold detection algorithm, in the R statistical language, can also be accessed at the website.

摘要

背景

在之前的一篇报告(La等人,《蛋白质》,2005年)中,我们已经证明,系统发育基序(即保留整个家族系统发育的蛋白质序列片段)的识别是一种很有前景的序列/功能注释方法。在一个结构和功能异质的数据集中,系统发育基序已被证明对应于各种各样的功能位点原型,包括那些由表面环、活性位点裂缝和较少暴露区域定义的原型。然而,在我们最初对该技术的演示中,系统发育基序的识别依赖于人工确定的相似性阈值,这阻碍了该技术的大规模应用。

结果

在本报告中,我们提出了一种算法方法,该方法可以在没有人为主观性的情况下确定阈值。该方法依赖于大量的原始数据预处理来改善信号检测。随后,对相似性得分进行围绕中心点划分聚类(PAMC),以评估功能注释仍有疑问的序列片段。通过与我们之前(人工)的结果和结构分析进行比较,证实了该方法的准确性。以磷酸丙糖异构酶和精氨酰-tRNA合成酶为例进行了讨论。一种定量功能位点预测评估算法表明,仅需要序列信息的系统发育基序预测几乎与那些结合了结构的进化追踪方法的预测一样好。

结论

自动阈值检测算法已被纳入我们基于网络的系统发育基序识别服务器MINER中。MINER可在网站http://www.pmap.csupomona.edu/MINER/上免费获取。在该网站上还可以访问COG数据库的预先计算的功能位点预测以及用R统计语言实现的阈值检测算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f910/1142304/253e8844d8b0/1471-2105-6-116-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验