Suppr超能文献

使用特征差异度量和基于密度的聚类进行特征选择:在生物数据中的应用。

Feature selection using feature dissimilarity measure and density-based clustering: application to biological data.

作者信息

Sengupta Debarka, Aich Indranil, Bandyopadhyay Sanghamitra

机构信息

Genome Institute of Singapore, Singapore 138 672, Singapore.

出版信息

J Biosci. 2015 Oct;40(4):721-30. doi: 10.1007/s12038-015-9556-y.

Abstract

Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.

摘要

降维已成为对复杂生物系统进行建模的常规流程。文献中已报道了大量特征选择技术,以在准确性和速度方面提升模型性能。在本文中,我们提出了一种无监督特征选择技术,使用最大信息压缩指数作为差异度量,并使用著名的基于密度的聚类识别技术DBSCAN来识别最大的不同特征自然组。该算法速度快且对用户提供的参数不太敏感。此外,该方法能自动确定所需的特征数量并识别它们。我们使用所提出的方法对多个不同大小的基准数据集进行降维。还将其性能与其他一些著名的特征选择方法进行了广泛比较。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验