Henry D R, Block J H
J Med Chem. 1979 May;22(5):465-72. doi: 10.1021/jm00191a002.
An investigation was made into the use of linear and quadratic discriminant analysis, along with K nearest-neighbor analysis, in the classification of a set of 51 compounds which were divided into five therapeutic categories. By superimposing each compound on a pattern structure, as first proposed by Cammarata, eight positions were assigned on the molecule. Each position was coded with the numerical value of a descriptor index. Relative molar refraction, which was the index used by Cammarata, was compared with a number of molecular connective indices. For each of the indices studied, it was found that only four of the eight positions contributed significantly to between-class differences. It was also found that first-order molecular connectivity, calculated as the sum of the contributions of each of the bonds joining a given position, resulted in consistently fewer misclassifications as compared with the other indices. Using first-order molecular connectivity, validation procedures were performed on the original set of compounds, on random samples drawn from this set, and on a set of ten compounds not included in the analysis. The results obtained were highly data dependent, but they, nevertheless, suggest that molecular connectivity indices should prove useful in structural classification procedures.
对线性判别分析、二次判别分析以及K近邻分析在一组51种化合物分类中的应用进行了研究,这些化合物被分为五个治疗类别。按照Cammarata最初提出的方法,将每种化合物叠加到一个模式结构上,在分子上指定了八个位置。每个位置用一个描述符索引的数值进行编码。将Cammarata使用的相对摩尔折射与一些分子连接性指数进行了比较。对于所研究的每个指数,发现八个位置中只有四个对类间差异有显著贡献。还发现,一阶分子连接性(计算为连接给定位置的每个键的贡献之和)与其他指数相比,导致的错误分类始终较少。使用一阶分子连接性,对原始化合物集、从该集合中抽取的随机样本以及一组未包含在分析中的十种化合物进行了验证程序。获得的结果高度依赖于数据,但它们仍然表明分子连接性指数在结构分类程序中应该是有用的。