Quasar Science Resources S.L., Camino de las Ceudas 2, E-28232 Las Rozas de Madrid, Spain.
Departamento de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain.
J Chem Inf Model. 2022 Mar 14;62(5):1214-1223. doi: 10.1021/acs.jcim.1c01323. Epub 2022 Mar 2.
This paper introduces Quasar Science Resources-Autonomous University of Madrid atomic force microscopy image data set (QUAM-AFM), the largest data set of simulated atomic force microscopy (AFM) images generated from a selection of 685,513 molecules that span the most relevant bonding structures and chemical species in organic chemistry. QUAM-AFM contains, for each molecule, 24 3D image stacks, each consisting of constant-height images simulated for 10 tip-sample distances with a different combination of AFM operational parameters, resulting in a total of 165 million images with a resolution of 256 × 256 pixels. The 3D stacks are especially appropriate to tackle the goal of the chemical identification within AFM experiments by using deep learning techniques. The data provided for each molecule include, besides a set of AFM images, ball-and-stick depictions, IUPAC names, chemical formulas, atomic coordinates, and map of atom heights. In order to simplify the use of the collection as a source of information, we have developed a graphical user interface that allows the search for structures by CID number, IUPAC name, or chemical formula.
本文介绍了 Quasar Science Resources-Autonomous 大学原子力显微镜图像数据集(QUAM-AFM),这是最大的模拟原子力显微镜(AFM)图像数据集,由 685513 种分子中选择的,涵盖了有机化学中最相关的键合结构和化学物质。QUAM-AFM 为每个分子包含 24 个 3D 图像堆栈,每个堆栈由在 10 个不同的 AFM 操作参数组合下模拟的恒定高度图像组成,总共生成了 1.65 亿张分辨率为 256×256 像素的图像。这些 3D 堆栈特别适合通过深度学习技术来解决 AFM 实验中的化学识别目标。为每个分子提供的数据除了一组 AFM 图像外,还包括球棒表示、IUPAC 名称、化学式、原子坐标和原子高度图。为了简化对数据集的使用,我们开发了一个图形用户界面,允许按 CID 号、IUPAC 名称或化学式搜索结构。