Singh Vikas, Verma Nishchal K
IEEE Trans Nanobioscience. 2022 Mar 8;PP. doi: 10.1109/TNB.2022.3157396.
Clustering of gene expression data has been proven to be very useful in various applications, i.e., identifying the natural structure inherent in gene expression, understanding gene functions, mining relevant information from noisy data, and understanding gene regulation. In all these applications, genes, i.e., features, play a crucial role in characterizing them into different groups. These features may be relevant, irrelevant, or redundant, but they have different contributions during the clustering process. This paper presents a novel approach by considering the effect of features during the clustering process. In the proposed method, the fuzzy c-means the objective function is modified using a weighted Euclidean distance between the features with a monotonically decreasing function. The monotonically decreasing function helps control the features' contribution during the clustering process to partition the data into more relevant clusters. The proposed approach is validated, and performance is presented in various clustering performance measures on the different standard datasets. These clustering performance measures have also been compared with multiple state-of-the-art methods.
基因表达数据聚类已被证明在各种应用中非常有用,即识别基因表达中固有的自然结构、理解基因功能、从噪声数据中挖掘相关信息以及理解基因调控。在所有这些应用中,基因,即特征,在将它们表征为不同组的过程中起着至关重要的作用。这些特征可能是相关的、不相关的或冗余的,但它们在聚类过程中具有不同的贡献。本文提出了一种在聚类过程中考虑特征影响的新方法。在所提出的方法中,使用具有单调递减函数的特征之间的加权欧几里得距离来修改模糊c均值目标函数。单调递减函数有助于在聚类过程中控制特征的贡献,以便将数据划分为更相关的簇。对所提出的方法进行了验证,并在不同标准数据集上的各种聚类性能度量中展示了性能。这些聚类性能度量也与多种最新方法进行了比较。