Suppr超能文献

基于随机森林分类器的合成少数过采样技术和 PCA 分析对宫颈癌的识别。

Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier.

机构信息

Bharath Institute of Higher Education and Research, Tamil Nadu, India.

Mohamed Sathak A J Engineering College, Chennai, India.

出版信息

J Med Syst. 2019 Jul 17;43(9):286. doi: 10.1007/s10916-019-1402-6.

Abstract

Cervical cancer is the fourth most communal malignant disease amongst women worldwide. In maximum circumstances, cervical cancer indications are not perceptible at its initial stages. There are a proportion of features that intensify the threat of emerging cervical cancer like human papilloma virus, sexual transmitted diseases, and smoking. Ascertaining those features and constructing a classification model to categorize, if the cases are cervical cancer or not is an existing challenging research. This learning intentions at using cervical cancer risk features to build classification model using Random Forest (RF) classification technique with the synthetic minority oversampling technique (SMOTE) and two feature reduction techniques recursive feature elimination and principle component analysis (PCA). Utmost medical data sets are frequently imbalanced since the number of patients is considerably fewer than the number of non-patients. For the imbalance of the used data set, SMOTE is cast-off to solve this problem. The data set comprises of 32 risk factors and four objective variables: Hinselmann, Schiller, Cytology and Biopsy. Accuracy, Sensitivity, Specificity, PPA and NPA of the four variables remains accurate after SMOTE when compared with values obtained before SMOTE. An RSOnto ontology has been created to visualize the progress in classification performance.

摘要

宫颈癌是全球女性中第四常见的恶性疾病。在大多数情况下,宫颈癌在其早期阶段没有明显的症状。有一些特征会增加宫颈癌的发病风险,如人乳头瘤病毒、性传播疾病和吸烟。确定这些特征,并构建一个分类模型来对病例进行分类,判断是否为宫颈癌,是一个具有挑战性的研究课题。本研究旨在使用宫颈癌风险特征,通过随机森林 (RF) 分类技术,结合合成少数过采样技术 (SMOTE) 和两种特征降维技术递归特征消除和主成分分析 (PCA),构建分类模型。由于患者数量远远少于非患者数量,大多数医学数据集通常存在不平衡问题。针对使用数据集的不平衡问题,我们使用 SMOTE 来解决这个问题。该数据集包含 32 个风险因素和四个目标变量:Hinselmann、Schiller、细胞学和活检。与 SMOTE 之前获得的值相比,SMOTE 后四个变量的准确性、敏感性、特异性、PPA 和 NPA 仍然保持准确。我们创建了一个 RSOnto 本体,以可视化分类性能的进展。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验