Ertl Peter
Novartis Pharma AG, Molecular Simulation Group, WKL-125.14.20, CH-4002 Basel, Switzerland.
J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):374-80. doi: 10.1021/ci0255782.
A large set of more than 3 million molecules was processed to find all the organic substituents contained in the set and to identify the most common ones. During the analysis, 849 574 unique substituents were found. Extrapolated to the number of known organic molecules, this result suggests that about 3.1 million substituents are known. Based on these findings the size of virtual organic chemistry space accessible using currently known synthetic methods is estimated to be between 10(20) and 10(24) molecules. The extracted substituents were characterized by calculated electronic, hydrophobic, steric, and hydrogen bonding properties as well as by the drug-likeness index. Various possible applications of such a large database of drug-like substituents characterized by calculated properties are discussed and illustrated by reference to a Web-based tool for automatic identification of bioisosteric groups.
处理了一组超过300万个分子,以找出该组中包含的所有有机取代基,并确定最常见的取代基。在分析过程中,发现了849574个独特的取代基。外推到已知有机分子的数量,该结果表明约有310万个取代基是已知的。基于这些发现,使用当前已知合成方法可访问的虚拟有机化学空间大小估计在10(20)到10(24)个分子之间。通过计算电子、疏水、空间和氢键性质以及药物相似性指数对提取的取代基进行了表征。讨论了这样一个以计算性质为特征的类药物取代基大型数据库的各种可能应用,并通过一个基于网络的自动识别生物电子等排体基团的工具进行了说明。