Ghose A K, Viswanadhan V N, Wendoloski J J
Amgen Inc., Thousand Oaks, California 91320, USA.
J Comb Chem. 1999 Jan;1(1):55-68. doi: 10.1021/cc9800071.
The discovery of various protein/receptor targets from genomic research is expanding rapidly. Along with the automation of organic synthesis and biochemical screening, this is bringing a major change in the whole field of drug discovery research. In the traditional drug discovery process, the industry tests compounds in the thousands. With automated synthesis, the number of compounds to be tested could be in the millions. This two-dimensional expansion will lead to a major demand for resources, unless the chemical libraries are made wisely. The objective of this work is to provide both quantitative and qualitative characterization of known drugs which will help to generate "drug-like" libraries. In this work we analyzed the Comprehensive Medicinal Chemistry (CMC) database and seven different subsets belonging to different classes of drug molecules. These include some central nervous system active drugs and cardiovascular, cancer, inflammation, and infection disease states. A quantitative characterization based on computed physicochemical property profiles such as log P, molar refractivity, molecular weight, and number of atoms as well as a qualitative characterization based on the occurrence of functional groups and important substructures are developed here. For the CMC database, the qualifying range (covering more than 80% of the compounds) of the calculated log P is between -0.4 and 5.6, with an average value of 2.52. For molecular weight, the qualifying range is between 160 and 480, with an average value of 357. For molar refractivity, the qualifying range is between 40 and 130, with an average value of 97. For the total number of atoms, the qualifying range is between 20 and 70, with an average value of 48. Benzene is by far the most abundant substructure in this drug database, slightly more abundant than all the heterocyclic rings combined. Nonaromatic heterocyclic rings are twice as abundant as the aromatic heterocycles. Tertiary aliphatic amines, alcoholic OH and carboxamides are the most abundant functional groups in the drug database. The effective range of physicochemical properties presented here can be used in the design of drug-like combinatorial libraries as well as in developing a more efficient corporate medicinal chemistry library.
基因组研究中各种蛋白质/受体靶点的发现正在迅速扩展。随着有机合成和生化筛选的自动化,这正在给整个药物研发领域带来重大变革。在传统的药物研发过程中,制药行业要测试数千种化合物。有了自动化合成,待测试的化合物数量可能达到数百万种。这种二维的扩展将导致对资源的巨大需求,除非化学文库的构建明智合理。这项工作的目标是对已知药物进行定量和定性表征,这将有助于生成“类药物”文库。在这项工作中,我们分析了《综合药物化学》(CMC)数据库以及属于不同类别的药物分子的七个不同子集。这些包括一些中枢神经系统活性药物以及心血管、癌症、炎症和感染疾病状态的药物。本文基于计算得到的物理化学性质概况(如log P、摩尔折射率、分子量和原子数)进行了定量表征,并基于官能团和重要子结构的出现情况进行了定性表征。对于CMC数据库而言,计算得到的log P的合格范围(涵盖超过80%的化合物)在 -0.4至5.6之间,平均值为2.52。对于分子量,合格范围在160至480之间,平均值为357。对于摩尔折射率,合格范围在40至130之间,平均值为97。对于原子总数,合格范围在20至70之间,平均值为48。苯是该药物数据库中迄今为止最丰富的子结构,比所有杂环的总和略多。非芳香族杂环比芳香族杂环丰富两倍。叔脂肪胺、醇羟基和羧酰胺是药物数据库中最丰富的官能团。这里呈现的物理化学性质的有效范围可用于设计类药物组合文库以及开发更高效的企业药物化学文库。