Suppr超能文献

PHYSEAN:基于氨基酸物理化学性质鉴定蛋白质结构域的物理序列分析

PHYSEAN: PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids.

作者信息

Ladunga I

机构信息

SmithKline Beecham Pharmaceuticals, Bioinformatics Department, King of Prussia, PA 19406-0939, USA.

出版信息

Bioinformatics. 1999 Dec;15(12):1028-38. doi: 10.1093/bioinformatics/15.12.1028.

Abstract

MOTIVATION

PHYSEAN predicts protein classes with highly variable sequences on the basis of their physical, chemical and biological characteristics such as diverse hydrophobicity, structural propensity and steric properties. These characteristics, calculated from multiple positions in a sequence, may be conserved even between sequences that fail to produce alignments at any acceptable level of statistical significance. PHYSEAN complements methods that require sequence alignments (BLAST, FASTA, dynamic programming) by adding less residue- and position-specific physicochemical information on the protein or the domain.

RESULTS

We predict proteins or their domains like signal peptides using physical, chemical, geometric, and biological properties of the 20 amino acids. This comprehensive set of properties may cover the diagnostic functional and structural aspects of a domain or a protein class. We automatically select and weight a subset of properties so as to discriminate between, e.g., signal peptides and amino-termini of cytosolic proteins with the lowest number of incorrect predictions. This optimal selection of properties and their weights significantly decreases the number of incorrect predictions as compared to any single property or any combination of unweighted properties. Weights have been optimized by high-performance linear programming models that systematically find the optimal solution from among an astronomic number of property/weight combinations. PHYSEAN's performance is demonstrated by highly accurate predictions of signal peptides (the vehicles for protein transport across membranes) and their cleavage sites. The results indicate reliable predictions are possible even in the lack of sequence conservation using an automated physical and chemical analysis of proteins.

摘要

动机

PHYSEAN基于蛋白质的物理、化学和生物学特性(如不同的疏水性、结构倾向和空间性质)来预测具有高度可变序列的蛋白质类别。这些从序列中的多个位置计算得出的特性,即使在任何可接受的统计显著性水平下都无法产生比对的序列之间也可能是保守的。PHYSEAN通过添加关于蛋白质或结构域的较少的残基和位置特异性物理化学信息,对需要序列比对的方法(BLAST、FASTA、动态规划)进行补充。

结果

我们利用20种氨基酸的物理、化学、几何和生物学特性来预测蛋白质或其结构域,如信号肽。这一全面的特性集可能涵盖一个结构域或一类蛋白质的诊断功能和结构方面。我们自动选择并权衡一组特性子集,以便以最少的错误预测来区分例如信号肽和胞质蛋白的氨基末端。与任何单个特性或未加权特性的任何组合相比,这种特性及其权重的最佳选择显著减少了错误预测的数量。权重已通过高性能线性规划模型进行了优化,该模型从大量的特性/权重组合中系统地找到最优解。PHYSEAN的性能通过对信号肽(蛋白质跨膜运输的载体)及其切割位点的高度准确预测得到了证明。结果表明,即使在缺乏序列保守性的情况下,通过对蛋白质进行自动化的物理和化学分析也有可能做出可靠的预测。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验