Xue Ling, Godden Jeffrey W, Stahura Florence L, Bajorath Jürgen
Department of Computer-Aided Drug Discovery, Albany Molecular Research, Inc., Bothell Research Center (AMRI-BRC), 18804 North Creek Parkway, Bothell, Washington 98011, USA.
J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1151-7. doi: 10.1021/ci030285+.
A new fingerprint design concept is introduced that transforms molecular property descriptors into two-state descriptors and thus permits binary encoding. This transformation is based on the calculation of statistical medians of descriptor distributions in large compound collections and alleviates the need for value range encoding of these descriptors. For binary encoded property descriptors, bit positions that are set off capture as much information as bit positions that are set on, different from conventional fingerprint representations. Accordingly, a variant of the Tanimoto coefficient has been defined for comparison of these fingerprints. Following our design idea, a prototypic fingerprint termed MP-MFP was implemented by combining 61 binary encoded property descriptors with 110 structural fragment-type descriptors. The performance of this fingerprint was evaluated in systematic similarity search calculations in a database containing 549 molecules belonging to 38 different activity classes and 5000 background molecules. In these calculations, MP-MFP correctly recognized approximately 34% of all similarity relationships, with only 0.04% false positives, and performed better than previous designs and MACCS keys. The results suggest that combinations of simplified two-state property descriptors have predictive value in the analysis of molecular similarity.
本文介绍了一种新的指纹设计概念,该概念将分子属性描述符转换为二态描述符,从而实现二进制编码。这种转换基于对大型化合物库中描述符分布的统计中位数的计算,减少了对这些描述符进行值域编码的需求。对于二进制编码的属性描述符,与传统指纹表示不同,被置为0的位位置所捕获的信息与被置为1的位位置一样多。因此,已经定义了一种Tanimoto系数的变体来比较这些指纹。按照我们的设计理念,通过将61个二进制编码的属性描述符与110个结构片段型描述符相结合,实现了一种名为MP-MFP的原型指纹。在一个包含属于38个不同活性类别的549个分子和5000个背景分子的数据库中,通过系统相似性搜索计算对该指纹的性能进行了评估。在这些计算中,MP-MFP正确识别了所有相似关系中的约34%,假阳性率仅为0.04%,并且比以前的设计和MACCS键表现更好。结果表明,简化的二态属性描述符的组合在分子相似性分析中具有预测价值。