Li Jialiang, Gao Ming, D'Agostino Ralph
Department of Statistics and Applied Probability, National University of Singapore, Singapore.
Duke University-NUS Graduate Medical School, Singapore.
Stat Med. 2019 Jun 15;38(13):2477-2503. doi: 10.1002/sim.8103. Epub 2019 Jan 30.
Deep learning neural network models such as multilayer perceptron (MLP) and convolutional neural network (CNN) are novel and attractive artificial intelligence computing tools. However, evaluation of the performance of these methods is not readily available for practitioners yet. We provide a tutorial for evaluating classification accuracy for various state-of-the-art learning approaches, including familiar shallow and deep learning methods. For qualitative response variables with more than two categories, many traditional accuracy measures such as sensitivity, specificity, and area under the receiver operating characteristic curve are not applicable and we have to consider their extensions properly. In this paper, a few important statistical concepts for multicategory classification accuracy are reviewed and their utilities for various learning algorithms are demonstrated with real medical examples. We offer problem-based R code to illustrate how to perform these statistical computations step by step. We expect that such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.
深度学习神经网络模型,如多层感知器(MLP)和卷积神经网络(CNN),是新颖且有吸引力的人工智能计算工具。然而,从业者目前还难以获得这些方法性能的评估。我们提供了一个教程,用于评估各种最先进学习方法的分类准确率,包括常见的浅层和深度学习方法。对于具有两个以上类别的定性响应变量,许多传统的准确率度量,如灵敏度、特异性和接收器操作特征曲线下的面积并不适用,我们必须适当地考虑它们的扩展。本文回顾了多类别分类准确率的一些重要统计概念,并通过实际医学示例展示了它们在各种学习算法中的效用。我们提供基于问题的R代码来说明如何逐步执行这些统计计算。我们期望这样的分析工具能为从业者所更熟悉,并在生物统计学中得到更广泛的应用。