Embrapa Agricultural Informatics, Campinas, São Paulo, Brazil.
Institute of Biology, University of Campinas, Campinas, São Paulo, Brazil.
PLoS One. 2018 Jul 10;13(7):e0200018. doi: 10.1371/journal.pone.0200018. eCollection 2018.
Protein secondary structure elements (PSSEs) such as α-helices, β-strands, and turns are the primary building blocks of the tertiary protein structure. Our primary interest here is to reveal the characteristics of the nanoenvironment formed by both PSSEs and their surrounding amino acid residues (AARs), which might contribute to the general understanding of how proteins fold. The characteristics of such nanoenvironments must be specific to each secondary structure element, and we have set our goal here to gather the fullest possible description of the α-helical nanoenvironment. In general, this postulate (the existence of specific nanoenvironments for specific protein substructures/neighbourhoods/regions with distinct functionality) was already successfully explored and confirmed for some protein regions, such as protein-protein interfaces and enzyme catalytic sites. Consequently, PSSEs were the obvious next choice for additional work for further evidence showing that specific nanoenvironments (having characteristics fully describable by means of structural and physical chemical descriptors) do exist for the corresponding and determined intraprotein regions. The nanoenvironment of α-helices (nEoαH) is defined as any region of the protein where this secondary structure element type is detected. The nEoαH, therefore, includes not only the α-helix amino acid residues but also the residues immediately around the α-helix. The hypothesis that motivated this work is that it might in fact be possible to detect a postulated "signal" or "signature" that distinguishes the specific location of α-helices. This "signal" must be discernible by tracking differences in the values of physical, chemical, physicochemical, structural and geometric descriptors immediately before (or after) the PSSE from those in the region along the α-helices. The search for this specific nanoenvironment "signal" was made possible by aligning previously selected α-helices of equal length. Afterward, we calculated the average value, standard deviation and mean square error at each aligned residue position for each selected descriptor. We applied Student's t-test, the Kolmogorov-Smirnov test and MANOVA statistical tests to the dataset constructed as described above, and the results confirmed that the hypothesized "signal"/"signature" is both existing/identifiable and capable of distinguishing the presence of an α-helix inside the specific nanoenvironment, contextualized as a specific region within the whole protein. However, such conclusion might rarely be reached if only one descriptor is considered at a time. A more accurate signal with broader coverage is achieved only if one applies multivariate analysis, which means that several descriptors (usually approximately 10 descriptors) should be considered at the same time. To a limited extent (up to a maximum of 15% of cases), such conclusion is also possible with only a single descriptor, and the conclusion is also possible in general for up to 50-80% of cases when no less than 5 nonlinear descriptors are selected and considered. Using all the descriptors considered in this work, provided all assumptions about data characteristics for this analysis are met, multivariate analysis regularly reached a coverage and accuracy above 90%. Understanding how secondary structure elements are formed and maintained within a protein structure could enable a more detailed understanding of how proteins reach their final 3D structure and consequently, their function. Likewise, this knowledge may also improve the tools used to determine how good a structure is by means of comparing the "signal" around a selected PSSE with the one obtained from the best (resolution and quality wise) protein structures available.
蛋白质二级结构元件(PSSE),如α-螺旋、β-折叠和转角,是三级蛋白质结构的主要构建块。我们这里的主要兴趣是揭示 PSSE 及其周围氨基酸残基(AAR)形成的纳米环境的特征,这可能有助于我们更好地理解蛋白质如何折叠。这种纳米环境的特征必须针对每个二级结构元件,我们的目标是尽可能全面地描述α-螺旋纳米环境。一般来说,这个假设(特定的蛋白质亚结构/邻近区域/具有独特功能的区域存在特定的纳米环境)已经在一些蛋白质区域得到了成功的探索和证实,如蛋白质-蛋白质界面和酶催化位点。因此,PSSE 是进一步研究的明显选择,以进一步证明存在特定的纳米环境(具有可以通过结构和物理化学描述符完全描述的特征),用于相应的和确定的蛋白质内部区域。α-螺旋的纳米环境(nEoαH)被定义为检测到这种二级结构元件类型的蛋白质的任何区域。因此,nEoαH 不仅包括α-螺旋氨基酸残基,还包括α-螺旋周围的残基。这项工作的动机是,实际上可能有可能检测到假设的“信号”或“特征”,以区分α-螺旋的特定位置。这个“信号”必须可以通过跟踪 PSSE 前后物理、化学、物理化学、结构和几何描述符的值差异来识别。通过对齐先前选择的具有相同长度的α-螺旋,实现了对这种特定纳米环境“信号”的搜索。然后,我们为每个选定的描述符计算了每个对齐残基位置的平均值、标准差和均方误差。我们对如上所述构建的数据集应用了学生 t 检验、柯尔莫哥洛夫-斯米尔诺夫检验和多变量方差分析统计检验,结果证实了假设的“信号”/“特征”是存在的/可识别的,并且能够区分特定纳米环境中α-螺旋的存在,将其上下文化为整个蛋白质中的特定区域。然而,如果一次只考虑一个描述符,很少会得出这样的结论。只有应用多元分析,才能获得更准确、覆盖范围更广的信号,这意味着应该同时考虑多个描述符(通常约 10 个描述符)。在一定程度上(最多 15%的情况下),即使只使用一个描述符,也可以得出这样的结论,当选择并考虑不少于 5 个非线性描述符时,通常也可以得出这样的结论,即对于多达 50-80%的情况。使用这项工作中考虑的所有描述符,如果满足此分析中有关数据特征的所有假设,则多元分析通常可以达到 90%以上的覆盖率和准确性。了解二级结构元件如何在蛋白质结构中形成和维持,可以帮助我们更详细地了解蛋白质如何达到其最终的 3D 结构,从而了解其功能。同样,这种知识也可以改进用于确定结构质量的工具,方法是将选定 PSSE 周围的“信号”与从最佳(分辨率和质量)蛋白质结构中获得的信号进行比较。