Pavlova Anna, Fan Zixing, Lynch Diane L, Gumbart James C
School of Physics, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
J Chem Inf Model. 2025 May 26;65(10):4844-4853. doi: 10.1021/acs.jcim.5c00274. Epub 2025 May 8.
An effective approach in the development of novel antivirals is to target the assembly of viral capsids by using capsid assembly modulators (CAMs). CAMs targeting hepatitis B virus (HBV) have two major modes of function: they can either accelerate nucleocapsid assembly, retaining its structure, or misdirect it into noncapsid-like particles. Previous molecular dynamics (MD) simulations of early capsid-assembly intermediates showed differences in protein conformations for the apo and bound states. Here, we have developed and tested several classification machine learning (ML) models to better distinguish between apo-tetramer intermediates and those bound to accelerating or misdirecting CAMs. Models based on tertiary structural properties of the Cp149 tetramers and their interdimer orientation, as well as models based on direct and inverse contact distances between protein residues, were tested. All models distinguished the apo states and the two CAM-bound states with high accuracy. Furthermore, tertiary structure models and residue-distance models highlighted different tetramer regions as being important for classification. Both models can be used to better understand structural transitions that govern the assembly of nucleocapsids and to assist in the development of more potent CAMs. Finally, we demonstrate the utility of classification ML methods in comparing MD trajectories and describe our ML approaches, which can be extended to other systems of interest.
开发新型抗病毒药物的一种有效方法是通过使用衣壳组装调节剂(CAMs)来靶向病毒衣壳的组装。针对乙型肝炎病毒(HBV)的CAMs有两种主要功能模式:它们要么加速核衣壳组装,保持其结构,要么将其引导到非衣壳样颗粒中。先前对早期衣壳组装中间体的分子动力学(MD)模拟显示,无配体状态和结合状态的蛋白质构象存在差异。在这里,我们开发并测试了几种分类机器学习(ML)模型,以更好地区分无配体四聚体中间体与结合了加速或误导性CAMs的中间体。测试了基于Cp149四聚体的三级结构特性及其二聚体间取向的模型,以及基于蛋白质残基之间直接和反向接触距离的模型。所有模型都能高精度地区分无配体状态和两种CAM结合状态。此外,三级结构模型和残基距离模型突出了不同的四聚体区域对分类很重要。这两种模型都可用于更好地理解控制核衣壳组装的结构转变,并有助于开发更有效的CAMs。最后,我们展示了分类ML方法在比较MD轨迹方面的实用性,并描述了我们的ML方法,该方法可扩展到其他感兴趣的系统。