King R D, Muggleton S H, Srinivasan A, Sternberg M J
Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, London, United Kingdom.
Proc Natl Acad Sci U S A. 1996 Jan 9;93(1):438-42. doi: 10.1073/pnas.93.1.438.
We present a general approach to forming structure-activity relationships (SARs). This approach is based on representing chemical structure by atoms and their bond connectivities in combination with the inductive logic programming (ILP) algorithm PROGOL. Existing SAR methods describe chemical structure by using attributes which are general properties of an object. It is not possible to map chemical structure directly to attribute-based descriptions, as such descriptions have no internal organization. A more natural and general way to describe chemical structure is to use a relational description, where the internal construction of the description maps that of the object described. Our atom and bond connectivities representation is a relational description. ILP algorithms can form SARs with relational descriptions. We have tested the relational approach by investigating the SARs of 230 aromatic and heteroaromatic nitro compounds. These compounds had been split previously into two subsets, 188 compounds that were amenable to regression and 42 that were not. For the 188 compounds, a SAR was found that was as accurate as the best statistical or neural network-generated SARs. The PROGOL SAR has the advantages that it did not need the use of any indicator variables handcrafted by an expert, and the generated rules were easily comprehensible. For the 42 compounds, PROGOL formed a SAR that was significantly (P < 0.025) more accurate than linear regression, quadratic regression, and back-propagation. This SAR is based on an automatically generated structural alert for mutagenicity.
我们提出了一种构建构效关系(SARs)的通用方法。该方法基于通过原子及其键连接性来表示化学结构,并结合归纳逻辑编程(ILP)算法PROGOL。现有的SAR方法通过使用作为对象通用属性的特征来描述化学结构。由于此类描述没有内部组织,因此不可能将化学结构直接映射到基于特征的描述。描述化学结构的一种更自然、更通用的方法是使用关系描述,其中描述的内部结构映射所描述对象的内部结构。我们的原子和键连接性表示就是一种关系描述。ILP算法可以利用关系描述来形成SARs。我们通过研究230种芳香族和杂芳香族硝基化合物的构效关系对这种关系方法进行了测试。这些化合物先前已被分成两个子集,188种适合回归分析的化合物和42种不适合的化合物。对于这188种化合物,发现了一种与最佳统计或神经网络生成的SARs一样准确的构效关系。PROGOL构效关系的优点在于它不需要使用任何由专家精心设计的指示变量,并且生成的规则易于理解。对于那42种化合物,PROGOL形成的构效关系比线性回归、二次回归和反向传播显著更准确(P < 0.025)。这种构效关系基于一个自动生成的致突变性结构警报。