Li ZhiLiang, Wu ShiRong, Chen ZeCong, Ye Nancy, Yang ShengXi, Liao ChunYang, Zhang MengJun, Yang Li, Mei Hu, Yang Yan, Zhao Na, Zhou Yuan, Zhou Ping, Xiong Qing, Xu Hong, Liu ShuShen, Ling ZiHua, Chen Gang, Li GenRong
College of Chemistry and Chemical Engineering/Key Laboratory for Chemobiomedical Science and Engineering under Chongqing Municipality, Chongqing University, Chongqing 400044, China.
Sci China C Life Sci. 2007 Oct;50(5):706-16. doi: 10.1007/s11427-007-0080-7.
Only from the primary structures of peptides, a new set of descriptors called the molecular electronegativity edge-distance vector (VMED) was proposed and applied to describing and characterizing the molecular structures of oligopeptides and polypeptides, based on the electronegativity of each atom or electronic charge index (ECI) of atomic clusters and the bonding distance between atom-pairs. Here, the molecular structures of antigenic polypeptides were well expressed in order to propose the automated technique for the computerized identification of helper T lymphocyte (Th) epitopes. Furthermore, a modified MED vector was proposed from the primary structures of polypeptides, based on the ECI and the relative bonding distance of the fundamental skeleton groups. The side-chains of each amino acid were here treated as a pseudo-atom. The developed VMED was easy to calculate and able to work. Some quantitative model was established for 28 immunogenic or antigenic polypeptides (AGPP) with 14 (1-14) A(d) and 14 other restricted activities assigned as "1"(+) and "0"(-), respectively. The latter comprised 6 A(b)(15-20), 3 A(k)(21-23), 2 E(k)(24-26), 2 H-2(k)(27 and 28) restricted sequences. Good results were obtained with 90% correct classification (only 2 wrong ones for 20 training samples) and 100% correct prediction (none wrong for 8 testing samples); while contrastively 100% correct classification (none wrong for 20 training samples) and 88% correct classification (1 wrong for 8 testing samples). Both stochastic samplings and cross validations were performed to demonstrate good performance. The described method may also be suitable for estimation and prediction of classes I and II for major histocompatibility antigen (MHC) epitope of human. It will be useful in immune identification and recognition of proteins and genes and in the design and development of subunit vaccines. Several quantitative structure activity relationship (QSAR) models were developed for various oligopeptides and polypeptides including 58 dipeptides and 31 pentapeptides with angiotensin converting enzyme (ACE) inhibition by multiple linear regression (MLR) method. In order to explain the ability to characterize molecular structure of polypeptides, a molecular modeling investigation on QSAR was performed for functional prediction of polypeptide sequences with antigenic activity and heptapeptide sequences with tachykinin activity through quantitative sequence-activity models (QSAMs) by the molecular electronegativity edge-distance vector (VMED). The results showed that VMED exhibited both excellent structural selectivity and good activity prediction. Moreover, the results showed that VMED behaved quite well for both QSAR and QSAM of poly-and oligopeptides, which exhibited both good estimation ability and prediction power, equal to or better than those reported in the previous references. Finally, a preliminary conclusion was drawn: both classical and modified MED vectors were very useful structural descriptors. Some suggestions were proposed for further studies on QSAR/QSAM of proteins in various fields.
仅从肽的一级结构出发,基于每个原子的电负性或原子簇的电子电荷指数(ECI)以及原子对之间的键距,提出了一种名为分子电负性边缘距离向量(VMED)的新描述符,并将其应用于描述和表征寡肽和多肽的分子结构。在此,为了提出用于辅助性T淋巴细胞(Th)表位计算机识别的自动化技术,对抗抗原多肽的分子结构进行了很好的表达。此外,基于ECI和基本骨架基团的相对键距,从多肽的一级结构中提出了一种改进的MED向量。每个氨基酸的侧链在此被视为一个伪原子。所开发的VMED易于计算且能够发挥作用。针对28种免疫原性或抗原性多肽(AGPP)建立了一些定量模型,其中14种(1 - 14)具有A(d)活性,另外14种的活性分别被指定为“1”(+)和“0”(-)。后者包括6种A(b)(15 - 20)、3种A(k)(21 - 23)、2种E(k)(24 - 26)、2种H - 2(k)(27和28)受限序列。在20个训练样本中仅2个错误,正确分类率为90%;在8个测试样本中无一错误,正确预测率为100%;相比之下,正确分类率为100%(20个训练样本无一错误),正确分类率为88%(8个测试样本中有1个错误)。进行了随机抽样和交叉验证以证明良好的性能。所描述的方法也可能适用于估计和预测人类主要组织相容性抗原(MHC)表位的I类和II类。它将有助于蛋白质和基因的免疫识别,以及亚单位疫苗的设计和开发。通过多元线性回归(MLR)方法,针对包括58种二肽和31种五肽在内的各种寡肽和多肽建立了几个定量构效关系(QSAR)模型,这些肽具有血管紧张素转换酶(ACE)抑制活性。为了解释多肽表征分子结构的能力,通过分子电负性边缘距离向量(VMED),通过定量序列 - 活性模型(QSAMs)对具有抗原活性的多肽序列和具有速激肽活性的七肽序列进行了QSAR的分子建模研究,以进行功能预测。结果表明,VMED既表现出优异的结构选择性,又具有良好的活性预测能力。此外,结果表明,VMED在多肽和寡肽的QSAR和QSAM方面表现都相当好,具有良好的估计能力和预测能力,等于或优于先前参考文献中报道的结果。最后得出一个初步结论:经典和改进的MED向量都是非常有用的结构描述符。针对不同领域中蛋白质的QSAR/QSAM进一步研究提出了一些建议。