Gala Michal, Paul Evan David, Čekan Pavol, Žoldák Gabriel
MultiplexDX, s.r.o., Comenius University Science Park, Bratislava, Slovakia.
MultiplexDX, Inc., Rockville, MD, USA.
Methods Mol Biol. 2025;2870:153-182. doi: 10.1007/978-1-0716-4213-9_9.
This chapter explores the innovative application of machine learning techniques to understand and predict the stability of protein substructures. Accurately identifying stable substructures within proteins necessitates incorporating the local context, crucial for elucidating the roles of supersecondary structures. This approach emphasizes the importance of contextual information in understanding the stability and functionality of protein regions, thereby providing a more comprehensive view of protein mechanics and interactions. The chapter focuses on our findings regarding the DnaK Hsp70 chaperone protein, utilizing it as a case study. This research highlights how context-dependent physico-chemical features derived from protein sequences can accurately classify residues into stable and unstable substructures by leveraging logistic regression, random forest, and support vector machine methods. The findings represent a pivotal step towards the rational design of proteins with tailored properties, offering new insights into protein engineering and the fundamental principles underpinning protein supersecondary structures.
本章探讨了机器学习技术在理解和预测蛋白质亚结构稳定性方面的创新应用。准确识别蛋白质中的稳定亚结构需要纳入局部环境,这对于阐明超二级结构的作用至关重要。这种方法强调了上下文信息在理解蛋白质区域稳定性和功能方面的重要性,从而提供了对蛋白质力学和相互作用更全面的认识。本章重点介绍了我们关于DnaK Hsp70伴侣蛋白的研究结果,并将其作为案例进行分析。这项研究突出了如何通过利用逻辑回归、随机森林和支持向量机方法,从蛋白质序列中提取的依赖于上下文的物理化学特征能够准确地将残基分类为稳定和不稳定的亚结构。这些发现代表了朝着合理设计具有定制特性的蛋白质迈出的关键一步,为蛋白质工程以及支撑蛋白质超二级结构的基本原理提供了新的见解。