Shousha Hend Ibrahim, Awad Abubakr Hussein, Omran Dalia Abdelhamid, Elnegouly Mayada Mohamed, Mabrouk Mahasen
Endemic Medicine Department, Faculty of Medicine, Cairo University.
Computer Science Department, Faculty of Computers and Information, Cairo University.
Jpn J Infect Dis. 2018 Jan 23;71(1):51-57. doi: 10.7883/yoken.JJID.2017.089. Epub 2017 Dec 26.
IL28B single nucleotide polymorphism (rs12979860) is an etiology-independent predictor of hepatitis C virus (HCV)-related hepatic fibrosis. Data mining is a method of predictive analysis which can explore tremendous volumes of information from health records to discover hidden patterns and relationships. The current study aims to evaluate and compare the prediction accuracy of scoring system like aspartate aminotransferase-to-platelet ratio index (APRI) and fibrosis-4 (FIB-4) index versus data mining for the prediction of HCV-related advanced fibrosis. This retrospective study included 427 patients with chronic hepatitis C. We used data mining analysis to construct a decision tree by reduced error (REP) technique, followed by Auto-WEKA tool to select the best classifier out of 39 algorithms to predict advanced fibrosis. APRI and FIB-4 had sensitivity-specificity parameters of 0.523-0.831 and 0.415-0.917, respectively. REPTree algorithm was able to predict advanced fibrosis with sensitivity of 0.749, specificity of 0.729, and receiver operating characteristic (ROC) area of 0.796. Out of the 16 attributes, IL28B genotype was selected by the REPTree as the best predictor for advanced fibrosis. Using Auto-WEKA, the multilayer perceptron (MLP) neural model was selected as the best predictive algorithm with sensitivity of 0.825, specificity of 0.811, and ROC area of 0.880. Thus, MLP is better than APRI, FIB-4, and REPTree for predicting advanced fibrosis for patients with chronic hepatitis C.
白细胞介素28B单核苷酸多态性(rs12979860)是丙型肝炎病毒(HCV)相关肝纤维化的病因独立预测指标。数据挖掘是一种预测分析方法,可从健康记录中挖掘大量信息,以发现隐藏的模式和关系。本研究旨在评估和比较天冬氨酸转氨酶与血小板比值指数(APRI)和纤维化-4(FIB-4)指数等评分系统与数据挖掘在预测HCV相关晚期纤维化方面的预测准确性。这项回顾性研究纳入了427例慢性丙型肝炎患者。我们使用数据挖掘分析,通过减少误差(REP)技术构建决策树,随后使用自动WEKA工具从39种算法中选择最佳分类器来预测晚期纤维化。APRI和FIB-4的敏感性-特异性参数分别为0.523 - 0.831和0.415 - 0.917。REPTree算法能够预测晚期纤维化,敏感性为0.749,特异性为0.729,受试者操作特征(ROC)曲线下面积为0.796。在16个属性中,REPTree选择IL28B基因型作为晚期纤维化的最佳预测指标。使用自动WEKA,多层感知器(MLP)神经模型被选为最佳预测算法,敏感性为0.825,特异性为0.811,ROC曲线下面积为0.880。因此,对于预测慢性丙型肝炎患者的晚期纤维化,MLP优于APRI、FIB-4和REPTree。