Jain Namita, Ghosh Susmita, Ghosh Ashish
Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India.
International Institute of Information Technology, Bhubaneswar 751003, India.
Heliyon. 2024 Jul 20;10(15):e34736. doi: 10.1016/j.heliyon.2024.e34736. eCollection 2024 Aug 15.
The existing biclustering algorithms often depend on assumptions like monotonicity or linearity of feature relations for finding biclusters. Though a few algorithms overcome this problem using density-based methods, they tend to miss out many biclusters because they use global criteria for identifying dense regions. The proposed method, PF-RelDenBi, uses local variations in marginal and joint densities for each pair of features to find the subset of observations, forming the basis of the relation between them. It then finds the set of features connected by a common set of observations using a non-linear feature relation index, resulting in a bicluster. This approach allows us to find biclusters based on feature relations, even if the relations are non-linear or non-monotonous. Additionally, the proposed method does not require the user to provide any parameters, allowing its application to datasets from different domains. To study the behaviour of PF-RelDenBi on datasets with different properties, experiments were carried out on sixteen simulated datasets and the performance has been compared with eleven state-of-the-art algorithms. The proposed method is seen to produce better results for most of the simulated datasets. Experiments were conducted with five benchmark datasets and biclusters were detected using PF-RelDenBi. For the first two datasets, the detected biclusters were used to generate additional features that improved classification performance. For the other three datasets, the performance of PF-RelDenBi was compared with the eleven state-of-the-art methods in terms of accuracy, NMI and ARI. The proposed method is seen to detect biclusters with greater accuracy. The proposed technique has also been applied to the COVID-19 dataset to identify some demographic features that are likely to affect the spread of COVID-19.
现有的双聚类算法通常依赖于诸如特征关系的单调性或线性等假设来寻找双聚类。尽管有一些算法使用基于密度的方法克服了这个问题,但它们往往会遗漏许多双聚类,因为它们使用全局标准来识别密集区域。所提出的方法PF-RelDenBi,利用每对特征的边际密度和联合密度的局部变化来找到观测值的子集,形成它们之间关系的基础。然后,它使用非线性特征关系指数找到由一组共同观测值连接的特征集,从而得到一个双聚类。这种方法使我们能够基于特征关系找到双聚类,即使这些关系是非线性或非单调的。此外,所提出的方法不需要用户提供任何参数,从而允许其应用于来自不同领域的数据集。为了研究PF-RelDenBi在具有不同属性的数据集上的行为,在16个模拟数据集上进行了实验,并将性能与11种最先进的算法进行了比较。对于大多数模拟数据集,所提出的方法被认为能产生更好的结果。使用PF-RelDenBi对五个基准数据集进行了实验并检测到了双聚类。对于前两个数据集,检测到的双聚类被用于生成额外的特征,这些特征提高了分类性能。对于其他三个数据集,在准确性、归一化互信息(NMI)和调整兰德指数(ARI)方面,将PF-RelDenBi的性能与11种最先进的方法进行了比较。所提出的方法被认为能更准确地检测双聚类。所提出的技术也已应用于COVID-19数据集,以识别一些可能影响COVID-19传播的人口统计学特征。