Zhang Initiative Research Unit, Institute Laboratories, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
J Cheminform. 2014 May 10;6:23. doi: 10.1186/1758-2946-6-23. eCollection 2014.
Measures of similarity for chemical molecules have been developed since the dawn of chemoinformatics. Molecular similarity has been measured by a variety of methods including molecular descriptor based similarity, common molecular fragments, graph matching and 3D methods such as shape matching. Similarity measures are widespread in practice and have proven to be useful in drug discovery. Because of our interest in electrostatics and high throughput ligand-based virtual screening, we sought to exploit the information contained in atomic coordinates and partial charges of a molecule.
A new molecular descriptor based on partial charges is proposed. It uses the autocorrelation function and linear binning to encode all atoms of a molecule into two rotation-translation invariant vectors. Combined with a scoring function, the descriptor allows to rank-order a database of compounds versus a query molecule. The proposed implementation is called ACPC (AutoCorrelation of Partial Charges) and released in open source. Extensive retrospective ligand-based virtual screening experiments were performed and other methods were compared with in order to validate the method and associated protocol.
While it is a simple method, it performed remarkably well in experiments. At an average speed of 1649 molecules per second, it reached an average median area under the curve of 0.81 on 40 different targets; hence validating the proposed protocol and implementation.
自化学生物信息学诞生以来,已经开发出了多种用于化学分子相似性测量的方法。分子相似性可以通过多种方法进行测量,包括基于分子描述符的相似性、共有分子片段、图形匹配和 3D 方法(如形状匹配)。相似性测量在实践中得到了广泛应用,并已被证明在药物发现中非常有用。由于我们对静电和高通量基于配体的虚拟筛选感兴趣,因此我们试图利用分子中原子坐标和部分电荷所包含的信息。
提出了一种基于部分电荷的新分子描述符。它使用自相关函数和线性 binning 将分子的所有原子编码成两个旋转平移不变向量。与评分函数结合使用,该描述符允许对化合物数据库相对于查询分子进行排序。所提出的实现方法称为 ACPC(部分电荷的自相关),并以开源的形式发布。为了验证该方法和相关协议,进行了广泛的回顾性基于配体的虚拟筛选实验,并与其他方法进行了比较。
虽然这是一种简单的方法,但在实验中表现得非常出色。在平均每秒 1649 个分子的速度下,它在 40 个不同的靶标上达到了平均中位数曲线下面积 0.81;从而验证了所提出的协议和实现。