Omran Dalia Abd El Hamid, Awad AbuBakr Hussein, Mabrouk Mahasen Abd El Rahman, Soliman Ahmad Fouad, Aziz Ashraf Omar Abdel
Endemic Medicine and Hepatology Department, Faculty of Medicine, Cairo University, Cairo, Egypt E-mail :
Asian Pac J Cancer Prev. 2015;16(1):381-5. doi: 10.7314/apjcp.2015.16.1.381.
Hepatocellular carcinoma (HCC) is the second most common malignancy in Egypt. Data mining is a method of predictive analysis which can explore tremendous volumes of information to discover hidden patterns and relationships. Our aim here was to develop a non-invasive algorithm for prediction of HCC. Such an algorithm should be economical, reliable, easy to apply and acceptable by domain experts.
This cross-sectional study enrolled 315 patients with hepatitis C virus (HCV) related chronic liver disease (CLD); 135 HCC, 116 cirrhotic patients without HCC and 64 patients with chronic hepatitis C. Using data mining analysis, we constructed a decision tree learning algorithm to predict HCC.
The decision tree algorithm was able to predict HCC with recall (sensitivity) of 83.5% and precession (specificity) of 83.3% using only routine data. The correctly classified instances were 259 (82.2%), and the incorrectly classified instances were 56 (17.8%). Out of 29 attributes, serum alpha fetoprotein (AFP), with an optimal cutoff value of ≥50.3 ng/ml was selected as the best predictor of HCC. To a lesser extent, male sex, presence of cirrhosis, AST>64U/L, and ascites were variables associated with HCC.
Data mining analysis allows discovery of hidden patterns and enables the development of models to predict HCC, utilizing routine data as an alternative to CT and liver biopsy. This study has highlighted a new cutoff for AFP (≥50.3 ng/ml). Presence of a score of >2 risk variables (out of 5) can successfully predict HCC with a sensitivity of 96% and specificity of 82%.
肝细胞癌(HCC)是埃及第二常见的恶性肿瘤。数据挖掘是一种预测分析方法,可探索大量信息以发现隐藏模式和关系。我们的目的是开发一种用于预测HCC的非侵入性算法。这样的算法应该经济、可靠、易于应用且为领域专家所接受。
这项横断面研究纳入了315例丙型肝炎病毒(HCV)相关慢性肝病(CLD)患者;135例HCC患者、116例无HCC的肝硬化患者和64例慢性丙型肝炎患者。使用数据挖掘分析,我们构建了一种决策树学习算法来预测HCC。
决策树算法仅使用常规数据就能以83.5%的召回率(敏感性)和83.3%的精确率(特异性)预测HCC。正确分类的实例为259例(82.2%),错误分类的实例为56例(17.8%)。在29个属性中,血清甲胎蛋白(AFP),最佳临界值≥50.3 ng/ml被选为HCC的最佳预测指标。在较小程度上,男性、肝硬化的存在、AST>64U/L和腹水是与HCC相关的变量。
数据挖掘分析有助于发现隐藏模式,并能够利用常规数据替代CT和肝活检来开发预测HCC的模型。本研究突出了AFP的一个新临界值(≥50.3 ng/ml)。存在>2个风险变量(共5个)可成功预测HCC,敏感性为96%,特异性为82%。