Patterson D E, Cramer R D, Ferguson A M, Clark R D, Weinberger L E
Tripos, Inc., St. Louis, Missouri 63144, USA.
J Med Chem. 1996 Aug 2;39(16):3049-59. doi: 10.1021/jm960290n.
When searching for new leads, testing molecules that are too "similar" is wasteful, but when investigating a lead, testing molecules that are "similar" to the lead is efficient. Two questions then arise. Which are the molecular descriptors that should be "similar"? How much "similarity" is enough? These questions are answered by demonstrating that, if a molecular descriptor is to be a valid and useful measure of "similarity" in drug discovery, a plot of differences in its values vs differences in biological activities for a set of related molecules will exhibit a characteristic trapezoidal distribution enhancement, revealing a "neighborhood behavior" for the descriptor. Applying this finding to 20 datasets allows 11 molecular diversity descriptors to be ranked by their validity for compound library design. In order of increasing frequency of usefulness, these are random numbers = log P = MR = strain energy < connectivity indices < 2D fingerprints (whole molecule) = atom pairs = autocorrelation indices < steric CoMFA fields = 2D fingerprints (side chain only) = H-bonding CoMFA fields.
在寻找新的先导化合物时,测试过于“相似”的分子是浪费时间,但在研究一个先导化合物时,测试与该先导化合物“相似”的分子则是高效的。于是就出现了两个问题。哪些分子描述符应该是“相似的”?多大程度的“相似性”才足够?通过证明如果一个分子描述符要成为药物发现中“相似性”的有效且有用的度量标准,那么对于一组相关分子,其值的差异与生物活性差异的关系图将呈现出一种特征性的梯形分布增强,从而揭示该描述符的“邻域行为”,这两个问题得以解答。将这一发现应用于20个数据集,可以根据11种分子多样性描述符在化合物库设计中的有效性对它们进行排序。按有用频率递增的顺序,这些描述符依次为:随机数 = 脂水分配系数对数 = 摩尔折射度 = 应变能 < 连接性指数 < 二维指纹(整个分子) = 原子对 = 自相关指数 < 立体场比较分子力场分析(CoMFA)场 = 二维指纹(仅侧链) = 氢键CoMFA场。