Birjandi Mehdi, Ayatollahi Seyyed Mohammad Taghi, Pourahmad Saeedeh, Safarpour Ali Reza
Department of Biostatistics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, IR Iran.
Gastroenterohepatology Research Center, Shiraz University of Medical Sciences, Shiraz, IR Iran.
Iran Red Crescent Med J. 2016 Aug 9;18(11):e32858. doi: 10.5812/ircmj.32858. eCollection 2016 Nov.
Non-alcoholic fatty liver disease (NAFLD) is the most common form of liver disease in many parts of the world.
The aim of the present study was to identify the most important factors influencing NAFLD using a classification tree (CT) to predict the probability of NAFLD.
This cross-sectional study was conducted in Kavar, a town in the south of Fars province, Iran. A total of 1,600 individuals were selected for the study via the stratified method and multiple-stage cluster random sampling. A total of 30 demographic and clinical variables were measured for each individual. Participants were divided into two datasets: testing and training. We used the training dataset (1,120 individuals) to build the CT and the testing dataset (480 individuals) to assess the CT. The CT was also used to estimate class and to predict fatty liver occurrence.
NAFLD was diagnosed in 22% of the individuals in the sample. Our findings revealed that the following variables, based on univariate analysis, had a significant association with NAFLD: marital status, history of hepatitis B vaccine, history of surgery, body mass index (BMI), waist-hip ratio (WHR), systolic blood pressure (SBP), diastolic blood pressure (DBP), high-density lipoprotein (HDL), triglycerides (TG), alanine aminotransferase (ALT), cholesterol (CHO0, aspartate aminotransferase (AST), glucose (GLU), albumin (AL), and age (P < 0.05). The main affecting variables for predicting NAFLD based on the CT and in order of importance were as follows: BMI, WHR, triglycerides, glucose, SBP, and alanine aminotransferase. The goodness of fit model based on the training and testing datasets were as follows: prediction accuracy (80%, 75%), sensitivity (74%, 73%), specificity (83%, 77%), and the area under the receiver operating characteristic (ROC) curve (78%, 75%), respectively.
The CT is a suitable and easy-to-interpret approach for decision-making and predicting NAFLD.
非酒精性脂肪性肝病(NAFLD)是世界许多地区最常见的肝脏疾病形式。
本研究的目的是使用分类树(CT)来识别影响NAFLD的最重要因素,以预测NAFLD的发生概率。
这项横断面研究在伊朗法尔斯省南部的一个城镇卡瓦尔进行。通过分层方法和多阶段整群随机抽样,共选择了1600名个体进行研究。为每个个体测量了总共30个人口统计学和临床变量。参与者被分为两个数据集:测试集和训练集。我们使用训练数据集(1120名个体)构建CT,并使用测试数据集(480名个体)评估CT。CT还用于估计类别并预测脂肪肝的发生。
样本中22%的个体被诊断为NAFLD。我们的研究结果显示,基于单变量分析,以下变量与NAFLD有显著关联:婚姻状况、乙肝疫苗接种史、手术史、体重指数(BMI)、腰臀比(WHR)、收缩压(SBP)、舒张压(DBP)、高密度脂蛋白(HDL)、甘油三酯(TG)、丙氨酸转氨酶(ALT)、胆固醇(CHO)、天冬氨酸转氨酶(AST)、葡萄糖(GLU)、白蛋白(AL)和年龄(P<0.05)。基于CT预测NAFLD的主要影响变量及其重要性顺序如下:BMI、WHR、甘油三酯、葡萄糖、SBP和丙氨酸转氨酶。基于训练集和测试集的拟合优度模型如下:预测准确率(80%,75%)、敏感性(74%,73%)、特异性(83%,77%)以及受试者操作特征(ROC)曲线下面积(78%,75%)。
CT是一种适用于决策和预测NAFLD的易于解释的方法。