Suppr超能文献

基于分子相似性的具有缺失信息的混合数据特征空间聚类的机器学习算法。

Machine learning algorithm for feature space clustering of mixed data with missing information based on molecule similarity.

机构信息

School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.

出版信息

J Biomed Inform. 2022 Jan;125:103954. doi: 10.1016/j.jbi.2021.103954. Epub 2021 Nov 15.

Abstract

Clustering Algorithms have just fascinated significant devotion in machine learning applications owing to their great competence. Nevertheless, the existing algorithms quite have approximately disputes that need to be further deciphered. For example, most existing algorithms transform one type of feature into another type, which disregards the explicit possessions of information. In addition, most of them deliberate whole features, which may lead to difficulty in calculation and effect in sub-optimal presentation. To address the above difficulties, this paper proposes a novel technique for clustering categorical and numerical features based on feature space clustering of mixed data with missing information (FSCMMI). The procedure involves three stages. Initially, FSCMMI divides the given dataset depending on missing information in instances and features types. The second stage uses the decision-tree procedure to identify the association between instances. Finally, the third stage is used for computing the closeness measure for numerical features and categorical features. Meanwhile, we propose a new training algorithm to cluster mixed datasets. Extensive experimental results on benchmark datasets show that the proposed FSCMMI outperforms several state-of-art clustering methods in terms of accuracy and efficiency.

摘要

聚类算法由于其强大的功能,在机器学习应用中引起了广泛关注。然而,现有的算法存在一些争议,需要进一步研究。例如,大多数现有算法将一种类型的特征转换为另一种类型,而忽略了信息的显式属性。此外,它们大多考虑整个特征,这可能导致计算困难和次优表示的效果。为了解决上述困难,本文提出了一种新的基于混合数据缺失信息的特征空间聚类的分类和数值特征聚类技术(FSCMMI)。该过程包括三个阶段。首先,FSCMMI 根据实例和特征类型中的缺失信息对给定数据集进行划分。第二阶段使用决策树过程来识别实例之间的关联。最后,第三阶段用于计算数值特征和分类特征的接近度度量。同时,我们提出了一种新的训练算法来对混合数据集进行聚类。在基准数据集上的广泛实验结果表明,所提出的 FSCMMI 在准确性和效率方面优于几种最先进的聚类方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验