Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195, Berlin, Germany.
Sci Data. 2022 Jun 17;9(1):327. doi: 10.1038/s41597-022-01297-3.
We present a data set from a first-principles study of amino-methylated and acetylated (capped) dipeptides of the 20 proteinogenic amino acids - including alternative possible side chain protonation states and their interactions with selected divalent cations (Ca, Mg and Ba). The data covers 21,909 stationary points on the respective potential-energy surfaces in a wide relative energy range of up to 4 eV (390 kJ/mol). Relevant properties of interest, like partial charges, were derived for the conformers. The motivation was to provide a solid data basis for force field parameterization and further applications like machine learning or benchmarking. In particular the process of creating all this data on the same first-principles footing, i.e. density-functional theory calculations employing the generalized gradient approximation with a van der Waals correction, makes this data suitable for first principles data-driven force field development. To make the data accessible across domain borders and to machines, we formalized the metadata in an ontology.
我们提出了一组数据,这些数据来自对 20 种蛋白质氨基酸的氨基甲基化和乙酰化(封端)二肽的第一性原理研究 - 包括可能的替代侧链质子化状态及其与选定的二价阳离子(Ca、Mg 和 Ba)的相互作用。这些数据涵盖了各自势能表面上的 21909 个稳定点,相对能量范围高达 4eV(390kJ/mol)。对于构象体,我们还推导了感兴趣的相关属性,如部分电荷。这样做的动机是为力场参数化和其他应用(如机器学习或基准测试)提供坚实的数据基础。特别是,在相同的第一性原理基础上创建所有这些数据的过程,即使用带有范德华修正的广义梯度近似的密度泛函理论计算,使这些数据适合于基于第一性原理的数据驱动力场开发。为了使数据能够跨越领域边界和机器访问,我们在本体中对元数据进行了形式化。