Oprea T I
Astra Hässle AB, Mölndal, Sweden.
J Comput Aided Mol Des. 2000 Mar;14(3):251-64. doi: 10.1023/a:1008130001697.
The process of compound selection and prioritization is crucial for both combinatorial chemistry (CBC) and high throughput screening (HTS). Compound libraries have to be screened for unwanted chemical structures, as well as for unwanted chemical properties. Property extrema can be eliminated by using property filters, in accordance with their actual distribution. Property distribution was examined in the following compound databases: MACCS-II Drug Data Report (MDDR), Current Patents Fast-alert, Comprehensive Medicinal Chemistry, Physician Desk Reference, New Chemical Entities, and the Available Chemical Directory (ACD). The ACDF and MDDRF subsets were created by removing reactive functionalities from the ACD and MDDR databases, respectively. The ACDF subset was further filtered by keeping only molecules with a 'drug-like' score [Ajay et al., J. Med. Chem., 41 (1998) 3314; Sadowski and Kubinyi, J. Med. Chem., 41 (1998) 3325] below 0.8. The following properties were examined: molecular weight (MW), the calculated octanol/water partition coefficient (CLOGP), the number of rotatable (RTB) and rigid bonds (RGB), the number of rings (RNG), and the number of hydrogen bond donors (HDO) and acceptors (HAC). Of these, MW and CLOGP follow a Gaussian distribution, whereas all other descriptors have an asymmetric (truncated Gaussian) distribution. Four out of five compounds in ACDF and MDDRF pass the 'rule of 5' test, a probability scheme that estimates oral absorption proposed by Lipinski et al. [Adv. Drug Deliv. Rev., 23 (1997) 3]. Because property distributions of HDO, HAC, MW and CLOGP (used in the 'rule of 5' test) do not differ significantly between these datasets, the 'rule of 5' does not distinguish 'drugs' from 'nondrugs'. Therefore, Pareto analyses were performed to examine skewed distributions in all compound collections. Seventy percent of the 'drug-like' compounds were found between the following limits: 0 < or = HDO < or = 2, 2 < or = HAC < or = 9, 2 < or = RTB < or = 8, and 1 < or = RNG < or = 4, respectively. The number of launched drugs in MDDR having 0 < or = HDO < or = 2 is 4.8 times higher than the number of drugs having 3 < or = HDO < or = 5. Skewed distributions can be exploited to focus on the 'drug-like space': 62.68% of ACDF ('nondrug-like') compounds have 0 < or = RNG < or = 2, and RGB < or = 17, while 28.88% of ACDF compounds have 3 < or = RNG < or = 13, and 18 < or = RGB < or = 56. By contrast, 61.22% of MDDRF compounds have RNG > or = 3, and RGB > or = 18, and only 24.73% of MDDRF compounds have 0 < or = RNG < or = 2 rings, and RGB < or = 17. The probability of identifying 'drug-like' structures increases with molecular complexity.
化合物的选择和优先级排序过程对于组合化学(CBC)和高通量筛选(HTS)都至关重要。必须对化合物库进行筛选,以去除不需要的化学结构以及不需要的化学性质。可以根据其实际分布,使用性质过滤器来消除性质极值。在以下化合物数据库中研究了性质分布:MACCS-II药物数据报告(MDDR)、当前专利快速警报、综合药物化学、医师案头参考、新化学实体以及可用化学目录(ACD)。分别通过从ACD和MDDR数据库中去除反应性官能团来创建ACDF和MDDRF子集。通过仅保留“类药”分数[Ajay等人,《药物化学杂志》,41(1998)3314;Sadowski和Kubinyi,《药物化学杂志》,41(1998)3325]低于0.8的分子,对ACDF子集进行进一步筛选。研究了以下性质:分子量(MW)、计算得到的辛醇/水分配系数(CLOGP)、可旋转键(RTB)和刚性键(RGB)的数量、环数(RNG)以及氢键供体(HDO)和受体(HAC)的数量。其中,MW和CLOGP遵循高斯分布,而所有其他描述符具有不对称(截断高斯)分布。ACDF和MDDRF中五分之四的化合物通过了“五规则”测试,这是Lipinski等人[《药物递送评论进展》,23(1997)3]提出的一种估计口服吸收的概率方案。由于这些数据集中HDO、HAC、MW和CLOGP(用于“五规则”测试)的性质分布没有显著差异,“五规则”无法区分“药物”和“非药物”。因此,进行了帕累托分析以研究所有化合物集合中的偏态分布。发现70%的“类药”化合物分别在以下限度之间:0≤HDO≤2、2≤HAC≤9、2≤RTB≤8和1≤RNG≤4。MDDR中HDO≤2的已上市药物数量比HDO为3≤HDO≤5的药物数量高4.8倍。可以利用偏态分布来关注“类药空间”:62.68%的ACDF(“非类药”)化合物的RNG≤2且RGB≤17,而28.88%的ACDF化合物的RNG为3≤RNG≤13且18≤RGB≤56。相比之下,61.22%的MDDRF化合物的RNG≥3且RGB≥18,只有24.73%的MDDRF化合物的RNG≤2且RGB≤17。识别“类药”结构的概率随分子复杂性增加而增加。