Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany.
J Chem Inf Model. 2011 Jul 25;51(7):1545-51. doi: 10.1021/ci2001692. Epub 2011 Jun 24.
The characterization of structure-activity relationship (SAR) features of large compound data sets has been a hot topic in recent years, and different methods for large-scale SAR analysis have been introduced. The exploration of local SAR components and prioritization of compound subsets have thus far mostly relied on graphical analysis methods that capture similarity and potency relationships in a systematic manner. A currently unsolved problem in large-scale SAR analysis is how to automatically select those compound subsets from large data sets that carry most SAR information. For this purpose, we introduce a numerical optimization scheme that is based on particle swarm optimization guided by an SAR scoring function. The methodology is applied to four large compound sets. We demonstrate that compound subsets representing the most discontinuous local SARs are consistently selected through particle swarm optimization.
近年来,对大型化合物数据集的结构-活性关系(SAR)特征的描述一直是一个热门话题,并且已经引入了不同的大规模 SAR 分析方法。局部 SAR 成分的探索和化合物子集的优先级排序迄今为止主要依赖于图形分析方法,这些方法以系统的方式捕捉相似性和效力关系。在大规模 SAR 分析中,一个尚未解决的问题是如何自动从大数据集中选择那些携带最多 SAR 信息的化合物子集。为此,我们引入了一种基于 SAR 评分函数指导的粒子群优化的数值优化方案。该方法应用于四个大型化合物集。我们证明,通过粒子群优化可以一致地选择代表最不连续局部 SAR 的化合物子集。