Suppr超能文献

重要邻居:高维数据中二元分类的新方法。

Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data.

机构信息

Department of Biostatistics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran.

Bioinformatics and Computational Biology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.

出版信息

Biomed Res Int. 2017;2017:7560807. doi: 10.1155/2017/7560807. Epub 2017 Dec 11.

Abstract

nearest neighbors (KNN) are known as one of the simplest nonparametric classifiers but in high dimensional setting accuracy of KNN are affected by nuisance features. In this study, we proposed the important neighbors (KIN) as a novel approach for binary classification in high dimensional problems. To avoid the curse of dimensionality, we implemented smoothly clipped absolute deviation (SCAD) logistic regression at the initial stage and considered the importance of each feature in construction of dissimilarity measure with imposing features contribution as a function of SCAD coefficients on Euclidean distance. The nature of this hybrid dissimilarity measure, which combines information of both features and distances, enjoys all good properties of SCAD penalized regression and KNN simultaneously. In comparison to KNN, simulation studies showed that KIN has a good performance in terms of both accuracy and dimension reduction. The proposed approach was found to be capable of eliminating nearly all of the noninformative features because of utilizing oracle property of SCAD penalized regression in the construction of dissimilarity measure. In very sparse settings, KIN also outperforms support vector machine (SVM) and random forest (RF) as the best classifiers.

摘要

最近邻 (KNN) 被认为是最简单的非参数分类器之一,但在高维设置中,KNN 的准确性受到干扰特征的影响。在这项研究中,我们提出了重要邻居 (KIN) 作为一种用于高维问题中二元分类的新方法。为了避免维度灾难,我们在初始阶段实现了平滑剪辑绝对偏差 (SCAD) 逻辑回归,并考虑了在构建不相似度量时每个特征的重要性,将特征贡献作为 SCAD 系数在欧几里得距离上的函数。这种混合不相似度量的性质,结合了特征和距离的信息,同时具有 SCAD 惩罚回归和 KNN 的所有良好性质。与 KNN 相比,模拟研究表明,KIN 在准确性和降维方面都有很好的性能。由于在不相似度量的构建中利用了 SCAD 惩罚回归的 oracle 性质,因此该方法能够消除几乎所有的非信息特征。在非常稀疏的情况下,KIN 也优于支持向量机 (SVM) 和随机森林 (RF),是最好的分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c456/5742505/a6c98d5bcdd2/BMRI2017-7560807.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验