Chen Beining, Harrison Robert F, Papadatos George, Willett Peter, Wood David J, Lewell Xiao Qing, Greenidge Paulette, Stiefl Nikolaus
Department of Chemistry, University of Sheffield, Western Bank, Sheffield, UK.
J Comput Aided Mol Des. 2007 Jan-Mar;21(1-3):53-62. doi: 10.1007/s10822-006-9096-5. Epub 2007 Jan 5.
Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed.
机器学习方法可通过分析已知(非)活性分子的结构特征用于虚拟筛选,我们在此讨论为此目的使用核判别和朴素贝叶斯分类器(NBC)方法。我们报告了一种核方法,该方法允许处理由二进制、整数和实值描述符表示的分子,并表明其筛选性能与先前专门为分析分子结构的二进制指纹表示而开发的核方法几乎没有差异。然后,我们评估了训练集仅包含极少数活性分子时NBC的性能。在这种情况下,基于基团融合的更简单方法似乎能提供更好的筛选性能,尤其是在处理结构异质数据集时。