Wold S, Dunn W J, Hellberg S
Environ Health Perspect. 1985 Sep;61:257-68. doi: 10.1289/ehp.8561257.
Empirical models can be constructed relating the change in toxicity to the change in chemical structure for series of similar compounds or mixtures. The first step is to translate the variation in structure to quantitative numbers. This gives a data table, a data matrix denoted by X, which then is analyzed. The same type of the models can be used to relate the variation of in vivo data to the variation of a battery of in vitro tests. A single data analytical model cannot be applied to a set of compounds of diverse chemical structure. For such data sets, separate models must be developed for each subgroup of compounds. The data analytical problem then partly is one of classification, pattern recognition (PARC). The assumption of structural and biological similarity within each subset of modeled compounds is then essential for empirical models to apply. PARC is often used to classify compounds as active (toxic) or inactive. The data structure is then often asymmetric which puts special demands on the data analysis, making the traditional PARC methods inapplicable. Depending on the desired information from the data analysis and on the type of available data, four levels of PARC can be distinguished: (I) the data X are used to develop rules for classifying future compounds into one of the classes represented in X; (II) same as I, but the possibility of future compounds belonging to "unknown" classes not represented in X is taken into account; (III) same as II, plus the quantitative prediction of one activity variable (here toxicity) in some classes; (IV) same as III, but several quantitative activity (toxicity) variables are predicted.
可以构建经验模型,将一系列相似化合物或混合物的毒性变化与化学结构变化联系起来。第一步是将结构变化转化为定量数字。这会得到一个数据表,即由X表示的数据矩阵,然后对其进行分析。同一类型的模型可用于将体内数据的变化与一系列体外试验的变化联系起来。单个数据分析模型不能应用于化学结构多样的一组化合物。对于此类数据集,必须为每个化合物亚组开发单独的模型。数据分析问题在一定程度上于是就成了分类和模式识别(PARC)问题之一。那么,建模化合物每个子集中结构和生物学相似性的假设对于应用经验模型至关重要。PARC通常用于将化合物分类为活性(有毒)或非活性。数据结构于是通常是不对称的,这对数据分析提出了特殊要求,使得传统的PARC方法不适用。根据数据分析所需信息和可用数据类型,可以区分PARC的四个级别:(I)数据X用于制定规则,将未来化合物分类为X中所代表的类别之一;(II)与(I)相同,但考虑到未来化合物属于X中未代表的“未知”类别的可能性;(III)与(II)相同,加上对某些类别中一个活性变量(此处为毒性)的定量预测;(IV)与(III)相同,但预测几个定量活性(毒性)变量。