Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA 19104.
Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60637.
Proc Natl Acad Sci U S A. 2022 Nov 29;119(48):e2200018119. doi: 10.1073/pnas.2200018119. Epub 2022 Nov 21.
The hydrophobicity of proteins and similar surfaces, which display chemical heterogeneity at the nanoscale, drives countless aqueous interactions and assemblies. However, predicting how surface chemical patterning influences hydrophobicity remains a challenge. Here, we address this challenge by using molecular simulations and machine learning to characterize and model the hydrophobicity of a diverse library of patterned surfaces, spanning a wide range of sizes, shapes, and chemical compositions. We find that simple models, based only on polar content, are inaccurate, whereas complex neural network models are accurate but challenging to interpret. However, by systematically incorporating chemical correlations between surface groups into our models, we are able to construct a series of minimal models of hydrophobicity, which are both accurate and interpretable. Our models highlight that the number of proximal polar groups is a key determinant of hydrophobicity and that polar neighbors enhance hydrophobicity. Although our minimal models are trained on particular patch size and shape, their interpretability enables us to generalize them to rectangular patches of all shapes and sizes. We also demonstrate how our models can be used to predict hot-spot locations with the largest marginal contributions to hydrophobicity and to design chemical patterns that have a fixed polar content but vary widely in their hydrophobicity. Our data-driven models and the principles they furnish for modulating hydrophobicity could facilitate the design of novel materials and engineered proteins with stronger interactions or enhanced solubilities.
蛋白质和类似表面的疏水性,在纳米尺度上表现出化学异质性,驱动着无数的水相相互作用和组装。然而,预测表面化学图案如何影响疏水性仍然是一个挑战。在这里,我们通过使用分子模拟和机器学习来描述和模拟各种图案化表面的疏水性,这些表面具有广泛的大小、形状和化学成分。我们发现,仅基于极性含量的简单模型是不准确的,而复杂的神经网络模型虽然准确但难以解释。然而,通过系统地将表面基团之间的化学相关性纳入我们的模型中,我们能够构建一系列疏水性的最小模型,这些模型既准确又可解释。我们的模型强调了近邻极性基团的数量是疏水性的关键决定因素,并且极性邻基增强了疏水性。尽管我们的最小模型是针对特定的补丁大小和形状进行训练的,但它们的可解释性使我们能够将它们推广到所有形状和大小的矩形补丁上。我们还展示了如何使用我们的模型来预测对疏水性有最大边际贡献的热点位置,并设计具有固定极性含量但疏水性差异很大的化学图案。我们的数据驱动模型及其提供的调节疏水性的原理,可以促进具有更强相互作用或增强溶解度的新型材料和工程蛋白的设计。