Department of Mathematics, Utah State University, Logan, UT 84322.
IEEE Trans Pattern Anal Mach Intell. 1982 Apr;4(4):357-63. doi: 10.1109/tpami.1982.4767266.
The uniform data function is a function which assigns to the output of the fuzzy c-means (Fc-M) or fuzzy isodata algorithm a number which measures the quality or validity of the clustering produced by the algorithm. For the preselected number of cluster c, the Fc-M algorithm produces c vectors in the space in which the data lie, called cluster centers, which represent points about which the data are concentrated. It also produces for each data point c-membership values, numbers between zero and one which measure the similarity of the data points to each of the cluster centers. It is these membership values which indicate how the point is classified. They also indicate how well the point has been classified, in that values close to one indicate that the point is close to a particular center, but uniformly low memberships indicate that the point has not been classified clearly. The uniform data functional (UDF) combines the memberships in such a way as to indicate how well the data have been classified and is computed as follows. For each data point compute the ratio of its smallest membership to its largest and then compute the probability that one could obtain a smaller ratio (indicating better classification) from a clustering of a standard data set in which there is no cluster structure. These probabilities are then averaged over the data set to obtain the values of the UDF.
一致数据函数是一个将模糊 C 均值(Fc-M)或模糊等密度算法的输出分配给一个数字的函数,该数字衡量算法产生的聚类的质量或有效性。对于预先选择的聚类数量 c,Fc-M 算法会在数据所在的空间中生成 c 个向量,称为聚类中心,这些向量代表数据集中的数据点。它还为每个数据点生成 c 个成员值,这些值在 0 到 1 之间,用于衡量数据点与每个聚类中心的相似性。这些成员值表示如何对数据点进行分类。它们还指示数据点的分类效果如何,因为接近 1 的值表示数据点接近特定中心,但均匀低值的成员值表示数据点未被清晰地分类。一致数据函数(UDF)以指示数据分类效果的方式组合成员值,其计算方法如下。对于每个数据点,计算其最小成员值与最大成员值的比值,然后计算从没有聚类结构的标准数据集的聚类中获得更小比值(表示更好的分类)的概率。然后,将这些概率在数据集上平均,以获得 UDF 的值。