Sharma Puneet, Salapaka Srinivasa, Beck Carolyn
Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana Champaign, 104 S. Mathews Avenue, Urbana, Illinois 61801, USA.
J Chem Inf Model. 2008 Jan;48(1):27-41. doi: 10.1021/ci700023y. Epub 2007 Dec 6.
In this paper, we propose an algorithm for the design of lead generation libraries required in combinatorial drug discovery. This algorithm addresses simultaneously the two key criteria of diversity and representativeness of compounds in the resulting library and is computationally efficient when applied to a large class of lead generation design problems. At the same time, additional constraints on experimental resources are also incorporated in the framework presented in this paper. A computationally efficient scalable algorithm is developed, where the ability of the deterministic annealing algorithm to identify clusters is exploited to truncate computations over the entire data set to computations over individual clusters. An analysis of this algorithm quantifies the tradeoff between the error due to truncation and computational effort. Results applied on test data sets corroborate the analysis and show improvement by factors as large as 10 or more, depending on the data sets.
在本文中,我们提出了一种用于设计组合药物发现中所需的潜在药物生成库的算法。该算法同时解决了所得库中化合物的多样性和代表性这两个关键标准,并且在应用于一大类潜在药物生成设计问题时计算效率很高。同时,本文提出的框架中还纳入了对实验资源的额外约束。我们开发了一种计算效率高的可扩展算法,利用确定性退火算法识别聚类的能力,将对整个数据集的计算截断为对单个聚类的计算。对该算法的分析量化了截断误差与计算量之间的权衡。应用于测试数据集的结果证实了该分析,并表明根据数据集的不同,改进幅度高达10倍或更多。