Batista José, Bajorath Jürgen
Department of Life Science Informatics, B-IT, LIMES Institute, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstr. 2, 53113, Bonn, Germany.
Mol Divers. 2008 Feb;12(1):77-83. doi: 10.1007/s11030-008-9078-8. Epub 2008 May 28.
Substructures are among the most preferred molecular descriptors in chemoinformatics and medicinal chemistry. Conventional substructure-type descriptors are typically the result of well-defined design strategies. Previously, we have introduced Activity Class Characteristic Substructures (ACCS) derived from randomly generated molecular fragment populations and described their utility in similarity searching. Short ACCS fingerprints were found to perform surprisingly well on many compound classes when compared to more complex state-of-the-art 2D fingerprints. In order to elucidate potential reasons for the high predictive utility of ACCS, we have carried out a thorough analysis of their distribution in nine activity classes and nearly four million database compounds. We show that the discriminatory power of ACCS results from the rare occurrence of ACCS combinations in screening databases.
子结构是化学信息学和药物化学中最受欢迎的分子描述符之一。传统的子结构类型描述符通常是明确设计策略的结果。此前,我们引入了从随机生成的分子片段群体中衍生出的活性类别特征子结构(ACCS),并描述了它们在相似性搜索中的效用。与更复杂的先进二维指纹相比,短ACCS指纹在许多化合物类别上表现出惊人的良好性能。为了阐明ACCS具有高预测效用的潜在原因,我们对其在九个活性类别和近四百万个数据库化合物中的分布进行了全面分析。我们表明,ACCS的鉴别能力源于筛选数据库中ACCS组合的罕见出现。