Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan.
Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.
Sci Data. 2024 Oct 23;11(1):1164. doi: 10.1038/s41597-024-03999-2.
The function of a biomacromolecule is not only determined by its three-dimensional structure but also by its electronic state. Quantum chemical calculations are promising non-empirical methods available for determining the electronic state of a given structure. In this study, we used the fragment molecular orbital (FMO) method, which applies to biopolymers such as proteins, to provide physicochemical property values on representative structures in the SCOP2 database of protein families, a subset of the Protein Data Bank. Our dataset was constructed by over 5,000 protein structures, including over 200 million inter-fragment interaction energies (IFIEs) and their energy components obtained by pair interaction energy decomposition analysis (PIEDA) using FMO-MP2/6-31 G*. Moreover, three basis sets, 6-31 G*, 6-31 G**, and cc-pVDZ, were used for the FMO calculations of each structure, making it possible to compare the energies obtained with different basis functions for the same fragment pair. The total data size is approximately 6.7 GB. Our dataset will be useful for functional analyses and machine learning based on the physicochemical property values of proteins.
生物大分子的功能不仅由其三维结构决定,还由其电子态决定。量子化学计算是一种很有前途的非经验方法,可用于确定给定结构的电子态。在这项研究中,我们使用了片段分子轨道(FMO)方法,该方法适用于蛋白质等生物聚合物,为蛋白质家族 SCOP2 数据库中的代表性结构提供物理化学性质值,该数据库是蛋白质数据库的一个子集。我们的数据集由 5000 多个蛋白质结构组成,包括超过 2 亿个片段间相互作用能(IFIE)及其能量分量,这些能量分量是通过 FMO-MP2/6-31G使用对相互作用能分解分析(PIEDA)获得的。此外,我们还使用了 6-31G、6-31G**和 cc-pVDZ 这三个基组来进行每个结构的 FMO 计算,这使得我们可以比较相同片段对在不同基函数下获得的能量。总的数据大小约为 6.7GB。我们的数据集将有助于基于蛋白质物理化学性质值进行功能分析和机器学习。