College of Bioengineering, Chongqing University, 400030, Chongqing, China.
Amino Acids. 2009 Oct;37(4):583-91. doi: 10.1007/s00726-008-0177-8. Epub 2008 Sep 28.
On the basis of exploratory factor analysis, six multidimensional patterns of 516 amino acid attributes, namely, factor analysis scales of generalized amino acid information (FASGAI) involving hydrophobicity, alpha and turn propensities, bulky properties, compositional characteristics, local flexibility and electronic properties, are proposed to represent structures of 48 bitter-tasting dipeptides and 58 angiotensin-converting enzyme inhibitors. Characteristic parameters related to bioactivities of the peptides studied are selected by genetic algorithm, and quantitative structure-activity relationship (QSAR) models are constructed by partial least square (PLS). Our results by a leave-one-out cross validation are compared with the previously known structure representation method and are shown to give slightly superior or comparative performance. Further, two data sets are divided into training sets and test sets to validate the characterization repertoire of FASGAI. Performance of the PLS models developed by training samples by a leave-one-out cross validation and external validation for test samples are satisfying. These results demonstrate that FASGAI is an effective representation technique of peptide structures, and that FASGAI vectors have many preponderant characteristics such as straightforward physicochemical information, high characterization competence and easy manipulation. They can be further applied to investigate the relationship between structures and functions of various peptides, even proteins.
基于探索性因子分析,提出了 516 种氨基酸属性的六个多维模式,即广义氨基酸信息的因子分析尺度(FASGAI),涉及疏水性、α 和转角倾向、体积特性、组成特性、局部柔韧性和电子特性,用于表示 48 种苦味二肽和 58 种血管紧张素转化酶抑制剂的结构。通过遗传算法选择与肽生物活性相关的特征参数,并通过偏最小二乘法(PLS)构建定量构效关系(QSAR)模型。通过留一法交叉验证得到的结果与先前已知的结构表示方法进行了比较,结果表明其性能略有优越或相当。此外,将两个数据集分为训练集和测试集,以验证 FASGAI 的特征描述能力。通过留一法交叉验证和外部验证对训练样本进行 PLS 模型开发的性能令人满意。这些结果表明,FASGAI 是一种有效的肽结构表示技术,并且 FASGAI 向量具有许多优越的特性,如直接的物理化学信息、高表征能力和易于操作。它们可以进一步应用于研究各种肽甚至蛋白质的结构与功能之间的关系。