Eftekhar Behzad, Mohammad Kazem, Ardebili Hassan Eftekhar, Ghodsi Mohammad, Ketabchi Ebrahim
Department of Neurosurgery, Sina Hospital, Tehran University, Tehran, Iran.
BMC Med Inform Decis Mak. 2005 Feb 15;5:3. doi: 10.1186/1472-6947-5-3.
In recent years, outcome prediction models using artificial neural network and multivariable logistic regression analysis have been developed in many areas of health care research. Both these methods have advantages and disadvantages. In this study we have compared the performance of artificial neural network and multivariable logistic regression models, in prediction of outcomes in head trauma and studied the reproducibility of the findings.
1000 Logistic regression and ANN models based on initial clinical data related to the GCS, tracheal intubation status, age, systolic blood pressure, respiratory rate, pulse rate, injury severity score and the outcome of 1271 mainly head injured patients were compared in this study. For each of one thousand pairs of ANN and logistic models, the area under the receiver operating characteristic (ROC) curves, Hosmer-Lemeshow (HL) statistics and accuracy rate were calculated and compared using paired T-tests.
ANN significantly outperformed logistic models in both fields of discrimination and calibration but under performed in accuracy. In 77.8% of cases the area under the ROC curves and in 56.4% of cases the HL statistics for the neural network model were superior to that for the logistic model. In 68% of cases the accuracy of the logistic model was superior to the neural network model.
ANN significantly outperformed the logistic models in both fields of discrimination and calibration but lagged behind in accuracy. This study clearly showed that any single comparison between these two models might not reliably represent the true end results. External validation of the designed models, using larger databases with different rates of outcomes is necessary to get an accurate measure of performance outside the development population.
近年来,使用人工神经网络和多变量逻辑回归分析的结果预测模型已在医疗保健研究的许多领域得到开发。这两种方法都有优点和缺点。在本研究中,我们比较了人工神经网络和多变量逻辑回归模型在预测头部创伤结果方面的性能,并研究了结果的可重复性。
本研究比较了基于1271例主要头部受伤患者的格拉斯哥昏迷量表(GCS)、气管插管状态、年龄、收缩压、呼吸频率、脉搏率、损伤严重程度评分及结果等初始临床数据构建的1000个逻辑回归模型和人工神经网络模型。对于每一千对人工神经网络模型和逻辑模型,计算受试者操作特征(ROC)曲线下面积、霍斯默-莱梅肖(HL)统计量和准确率,并使用配对t检验进行比较。
在区分和校准方面,人工神经网络显著优于逻辑模型,但在准确率方面表现不佳。在77.8%的病例中,神经网络模型的ROC曲线下面积优于逻辑模型;在56.4%的病例中,神经网络模型的HL统计量优于逻辑模型。在68%的病例中,逻辑模型的准确率优于神经网络模型。
在区分和校准方面,人工神经网络显著优于逻辑模型,但在准确率方面落后。本研究清楚地表明,这两种模型之间的任何单一比较可能无法可靠地代表真实的最终结果。使用具有不同结果发生率的更大数据库对设计的模型进行外部验证,对于准确衡量开发人群之外的性能是必要的。