Selker H P, Griffith J L, Patil S, Long W J, D'Agostino R B
Department of Medicine, New England Medical Center, Boston, MA 02111, USA.
J Investig Med. 1995 Oct;43(5):468-76.
There is increasing interest in mathematical methods for the prediction of medical outcomes. Three methods have attracted particular attention: logistic regression, classification trees (such as ID3 and CART), and neural networks. To compare their relative performance, we used a large clinical database to develop and compare models using these methods.
Each modeling method was used to generate predictive instruments for acute cardiac ischemia (which includes acute myocardial infarction and unstable angina pectoris), using prospectivel-collected clinical data on 5773 patients, who presented over a two year period to six hospitals' emergency departments with chest pain or symptoms suggesting acute ischemia. This data set was then split into training (n = 3453) and test (n = 2320) sets. Of 200 available variables, modeling was restricted to those available within the first 10 minutes of emergency department care (history, physical exam, and electrocardiogram).
When the number of variables was limited to eight, representing a practical number for input in the real-time clinical setting, the logistic regression's receiver-operating characteristic (ROC) curve area, as a measure of diagnostic performance, was 0.887; the classification tree model's ROC curve area was 0.858, and the neural network's ROC curve area was 0.902. When the number of variables used by a model was not limited, the logistic regression's ROC area was 0.905, the classification tree model's 0.861, and the neural network's 0.923. Among these models the neural networks had noticeably poorer calibration. When the outputs from each of these unrestricted models were presented to each of the other methods as an additional independent variable, the ROC areas of the new "hybrid" models were not significantly better than the original unlimited models (ROC areas 0.858 to 0.920).
Logistic regression, classification tree, and neural network models all can provide excellent predictive performance of medical outcomes for clinical decision aids and policy models. Their ultimate limitations seem due to the availability of the information in data (a "data barrier") rather than their respective intrinsic properties. Choices between these methods would seem to be most appropriately based on the needs of the specific application, rather than on the premise that any one of these methods is intrinsically more powerful.
预测医疗结果的数学方法正受到越来越多的关注。三种方法尤其引人注目:逻辑回归、分类树(如ID3和CART)以及神经网络。为比较它们的相对性能,我们使用一个大型临床数据库来开发并比较使用这些方法的模型。
每种建模方法都用于生成急性心脏缺血(包括急性心肌梗死和不稳定型心绞痛)的预测工具,使用前瞻性收集的5773例患者的临床数据,这些患者在两年时间里因胸痛或提示急性缺血的症状到六家医院的急诊科就诊。然后将该数据集分为训练集(n = 3453)和测试集(n = 2320)。在200个可用变量中,建模仅限于急诊科护理最初10分钟内可用的变量(病史、体格检查和心电图)。
当变量数量限制为8个时,这是实时临床环境中实际可输入的数量,作为诊断性能的一种衡量,逻辑回归的受试者操作特征(ROC)曲线面积为0.887;分类树模型的ROC曲线面积为0.858,神经网络的ROC曲线面积为0.902。当模型使用的变量数量不受限制时,逻辑回归的ROC面积为0.905,分类树模型的为0.861,神经网络的为0.923。在这些模型中,神经网络的校准明显较差。当将这些无限制模型中的每一个的输出作为一个额外的独立变量呈现给其他方法时,新“混合”模型的ROC面积并不比原始无限制模型显著更好(ROC面积为0.858至0.920)。
逻辑回归、分类树和神经网络模型都可为临床决策辅助和政策模型提供出色的医疗结果预测性能。它们的最终局限性似乎归因于数据中信息的可用性(“数据障碍”),而非其各自的内在特性。在这些方法之间进行选择似乎最应基于特定应用的需求,而非基于任何一种方法本质上更强大的前提。