Zhang Yang, Vitalis Andreas
Department of Biochemistry, University of Zurich, Zurich, 8057, Switzerland.
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf321.
Despite the rapid growth of machine learning in biomolecular applications, information about protein dynamics is underutilized. Here, we introduce Nearl, an automated pipeline designed to extract dynamic features from large ensembles of molecular dynamics trajectories. Nearl aims to identify intrinsic patterns of molecular motion and to provide informative features for predictive modeling tasks. We implement two classes of dynamic features, termed marching observers and property-density flow, to capture local atomic motions while maintaining a view of the global configuration. Complemented by standard voxelization techniques, Nearl transforms substructures of proteins into three-dimensional (3D) grids, suitable for contemporary 3D convolutional neural networks (3D-CNNs). The pipeline leverages graphics processing unit (GPU) acceleration, adheres to the FAIR principles for research software, and prioritizes flexibility and user-friendliness, allowing customization of input formats and feature extraction.
The source code of Nearl is hosted at https://github.com/miemiemmmm/Nearl and archived at https://doi.org/10.5281/zenodo.15320286. The documentation is hosted on ReadTheDocs at https://nearl.readthedocs.io/en/latest/. All pre-built models are implemented in PyTorch and available on GitHub.
尽管机器学习在生物分子应用中迅速发展,但蛋白质动力学信息仍未得到充分利用。在此,我们介绍Nearl,这是一个自动化流程,旨在从大量分子动力学轨迹中提取动态特征。Nearl旨在识别分子运动的内在模式,并为预测建模任务提供信息丰富的特征。我们实现了两类动态特征,称为行进观测器和属性密度流,以捕捉局部原子运动,同时保持对全局构型的观察。通过标准体素化技术的补充,Nearl将蛋白质的子结构转换为三维(3D)网格,适用于当代三维卷积神经网络(3D-CNN)。该流程利用图形处理单元(GPU)加速,遵循研究软件的FAIR原则,并优先考虑灵活性和用户友好性,允许定制输入格式和特征提取。