• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EKNN:将连通性和密度纳入k近邻算法的集成分类器及其在癌症诊断中的应用

EKNN: Ensemble classifier incorporating connectivity and density into kNN with application to cancer diagnosis.

作者信息

Mahfouz Mohamed A, Shoukry Amin, Ismail Mohamed A

机构信息

Department of Computer and Systems Engineering, Faculty of Engineering, Alexandria University, Egypt.

Department of Computer and Systems Engineering, Faculty of Engineering, Alexandria University, Egypt; Computer Science and Engineering Dept., Egypt Japan University of Science and Technology, Alexandria, Egypt.

出版信息

Artif Intell Med. 2021 Jan;111:101985. doi: 10.1016/j.artmed.2020.101985. Epub 2020 Nov 8.

DOI:10.1016/j.artmed.2020.101985
PMID:33461685
Abstract

In the microarray-based approach for automated cancer diagnosis, the application of the traditional k-nearest neighbors kNN algorithm suffers from several difficulties such as the large number of genes (high dimensionality of the feature space) with many irrelevant genes (noise) relative to the small number of available samples and the imbalance in the size of the samples of the target classes. This research provides an ensemble classifier based on decision models derived from kNN that is applicable to problems characterized by imbalanced small size datasets. The proposed classification method is an ensemble of the traditional kNN algorithm and four novel classification models derived from it. The proposed models exploit the increase in density and connectivity using K-nearest neighbors table (KNN-table) created during the training phase. In the density model, an unseen sample u is classified as belonging to a class t if it achieves the highest increase in density when this sample is added to it i.e. the unseen sample can replace more neighbors in the KNN-table for samples of class t than other classes. In the other three connectivity models, the mean and standard deviation of the distribution of the average, minimum as well the maximum distance to the K neighbors of the members of each class are computed in the training phase. The class t to which u achieves the highest possibility of belongness to its distribution is chosen, i.e. the addition of u to the samples of this class produces the least change to the distribution of the corresponding decision model for class t. Combining the predicted results of the four individual models along with traditional kNN makes the decision space more discriminative. With the help of the KNN-table which can be updated online in the training phase, an improved performance has been achieved compared to the traditional kNN algorithm with slight increase in classification time. The proposed ensemble method achieves significant increase in accuracy compared to the accuracy achieved using any of its base classifiers on Kentridge, GDS3257, Notterman, Leukemia and CNS datasets. The method is also compared to several existing ensemble methods and state of the art techniques using different dimensionality reduction techniques on several standard datasets. The results prove clear superiority of EKNN over several individual and ensemble classifiers regardless of the choice of the gene selection strategy.

摘要

在基于微阵列的癌症自动诊断方法中,传统的k近邻(kNN)算法的应用存在若干困难,例如相对于少量可用样本,存在大量基因(特征空间的高维度)以及许多不相关基因(噪声),并且目标类样本的大小存在不平衡。本研究提供了一种基于从kNN派生的决策模型的集成分类器,适用于以不平衡小尺寸数据集为特征的问题。所提出的分类方法是传统kNN算法与从它派生的四个新颖分类模型的集成。所提出的模型利用在训练阶段创建的K近邻表(KNN-table)来提高密度和连通性。在密度模型中,如果一个未见过的样本u在添加到类t时实现了最高的密度增加,即该未见过的样本在KNN表中可以比其他类替换更多属于类t的邻居,则将其分类为属于类t。在其他三个连通性模型中,在训练阶段计算每个类的成员到K个邻居的平均距离、最小距离以及最大距离的分布的均值和标准差。选择u对其分布具有最高归属可能性的类t,即把u添加到该类的样本中会对类t的相应决策模型的分布产生最小的变化。将四个单独模型的预测结果与传统kNN相结合,使决策空间更具判别力。借助在训练阶段可以在线更新的KNN表,与传统kNN算法相比,在分类时间略有增加的情况下,性能得到了提升。与在Kentridge、GDS3257、Notterman、白血病和中枢神经系统数据集上使用其任何一个基分类器所达到的准确率相比,所提出的集成方法在准确率上有显著提高。该方法还与几种现有的集成方法以及在几个标准数据集上使用不同降维技术的现有技术进行了比较。结果证明,无论选择何种基因选择策略,EKNN都明显优于几种单独的和集成的分类器。

相似文献

1
EKNN: Ensemble classifier incorporating connectivity and density into kNN with application to cancer diagnosis.EKNN:将连通性和密度纳入k近邻算法的集成分类器及其在癌症诊断中的应用
Artif Intell Med. 2021 Jan;111:101985. doi: 10.1016/j.artmed.2020.101985. Epub 2020 Nov 8.
2
Fissures segmentation using surface features: content-based retrieval for mammographic mass using ensemble classifier.利用表面特征进行裂隙分割:基于内容的乳腺肿块检索使用集成分类器。
Acad Radiol. 2011 Dec;18(12):1475-84. doi: 10.1016/j.acra.2011.08.012.
3
R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data.R-Ensembler:一种基于粗糙集的贪婪集成属性选择算法,具有 kNN 插补功能,用于医学数据的分类。
Comput Methods Programs Biomed. 2020 Feb;184:105122. doi: 10.1016/j.cmpb.2019.105122. Epub 2019 Oct 8.
4
AVNM: A Voting based Novel Mathematical Rule for Image Classification.AVNM:一种基于投票的图像分类新数学规则。
Comput Methods Programs Biomed. 2016 Dec;137:195-201. doi: 10.1016/j.cmpb.2016.08.015. Epub 2016 Sep 26.
5
Ensemble Clustering Classification compete SVM and One-Class classifiers applied on plant microRNAs Data.集成聚类分类与支持向量机(SVM)和单类分类器在植物微小RNA数据上的应用进行比较。
J Integr Bioinform. 2016 Dec 22;13(5):304. doi: 10.2390/biecoll-jib-2016-304.
6
Gene expression cancer classification using modified K-Nearest Neighbors technique.使用改进的K近邻技术进行基因表达癌症分类。
Biosystems. 2019 Feb;176:41-51. doi: 10.1016/j.biosystems.2018.12.009. Epub 2019 Jan 3.
7
The k conditional nearest neighbor algorithm for classification and class probability estimation.用于分类和类概率估计的k条件最近邻算法。
PeerJ Comput Sci. 2019 May 13;5:e194. doi: 10.7717/peerj-cs.194. eCollection 2019.
8
Implementation of ensemble machine learning algorithms on exome datasets for predicting early diagnosis of cancers.基于外显子组数据集的集成机器学习算法在癌症早期诊断预测中的应用。
BMC Bioinformatics. 2022 Nov 18;23(1):496. doi: 10.1186/s12859-022-05050-w.
9
Ensemble of a subset of NN classifiers.神经网络分类器子集的集成。
Adv Data Anal Classif. 2018;12(4):827-840. doi: 10.1007/s11634-015-0227-5. Epub 2016 Jan 22.
10
Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records.基于半监督学习的异质电子病历中患者相似性研究。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):58. doi: 10.1186/s12911-021-01432-x.

引用本文的文献

1
Systemic Lupus Erythematosus: How Machine Learning Can Help Distinguish between Infections and Flares.系统性红斑狼疮:机器学习如何助力区分感染与病情发作
Bioengineering (Basel). 2024 Jan 17;11(1):0. doi: 10.3390/bioengineering11010090.
2
GraphChrom: A Novel Graph-Based Framework for Cancer Classification Using Chromosomal Rearrangement Endpoints.GraphChrom:一种基于图的新型框架,用于利用染色体重排端点进行癌症分类。
Cancers (Basel). 2022 Jun 22;14(13):3060. doi: 10.3390/cancers14133060.
3
An Ensemble-Based Deep Convolutional Neural Network for Computer-Aided Polyps Identification From Colonoscopy.
一种基于集成的深度卷积神经网络,用于结肠镜检查中的计算机辅助息肉识别。
Front Genet. 2022 Apr 26;13:844391. doi: 10.3389/fgene.2022.844391. eCollection 2022.
4
Automatic COVID-19 detection mechanisms and approaches from medical images: a systematic review.基于医学图像的新型冠状病毒肺炎自动检测机制与方法:一项系统综述
Multimed Tools Appl. 2022;81(20):28779-28798. doi: 10.1007/s11042-022-12952-7. Epub 2022 Mar 31.
5
Diagnosing hospital bacteraemia in the framework of predictive, preventive and personalised medicine using electronic health records and machine learning classifiers.在预测、预防和个性化医疗框架下,利用电子健康记录和机器学习分类器诊断医院菌血症。
EPMA J. 2021 Aug 31;12(3):365-381. doi: 10.1007/s13167-021-00252-3. eCollection 2021 Sep.