Institute of Systems Biology, Shanghai University, Shanghai, People's Republic of China.
PLoS One. 2012;7(6):e39369. doi: 10.1371/journal.pone.0039369. Epub 2012 Jun 22.
Amyloid fibrillar aggregates of polypeptides are associated with many neurodegenerative diseases. Short peptide segments in protein sequences may trigger aggregation. Identifying these stretches and examining their behavior in longer protein segments is critical for understanding these diseases and obtaining potential therapies. In this study, we combined machine learning and structure-based energy evaluation to examine and predict amyloidogenic segments. Our feature selection method discovered that windows consisting of long amino acid segments of ~30 residues, instead of the commonly used short hexapeptides, provided the highest accuracy. Weighted contributions of an amino acid at each position in a 27 residue window revealed three cooperative regions of short stretch, resemble the β-strand-turn-β-strand motif in A-βpeptide amyloid and β-solenoid structure of HET-s(218-289) prion (C). Using an in-house energy evaluation algorithm, the interaction energy between two short stretches in long segment is computed and incorporated as an additional feature. The algorithm successfully predicted and classified amyloid segments with an overall accuracy of 75%. Our study revealed that genome-wide amyloid segments are not only dependent on short high propensity stretches, but also on nearby residues.
多肽的淀粉样纤维状聚集物与许多神经退行性疾病有关。蛋白质序列中的短肽段可能引发聚集。识别这些延伸部分并研究它们在较长蛋白质片段中的行为对于理解这些疾病和获得潜在的治疗方法至关重要。在这项研究中,我们结合了机器学习和基于结构的能量评估来检查和预测淀粉样蛋白片段。我们的特征选择方法发现,由~30 个残基组成的长氨基酸片段组成的窗口,而不是常用的短六肽,提供了最高的准确性。在 27 个残基窗口中,每个位置的氨基酸加权贡献揭示了三个短延伸的协同区域,类似于 A-β肽淀粉样和 HET-s(218-289)朊病毒(C)的β-螺旋-转折-β-螺旋结构。使用内部能量评估算法,计算两个长段中两个短段之间的相互作用能,并将其作为附加特征合并。该算法成功地预测和分类了淀粉样片段,总体准确率为 75%。我们的研究表明,全基因组淀粉样片段不仅依赖于短的高倾向延伸,还依赖于附近的残基。