Zhang Zhongheng, Zhang Haoyang, Khanal Mahesh Kumar
Department of Emergency Medicine, Sir Run-Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou 310016, China.
Division of Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
Ann Transl Med. 2017 Nov;5(21):436. doi: 10.21037/atm.2017.08.22.
Risk scores play an important role in clinical medicine. With advances in information technology and availability of electronic healthcare record, scoring systems of less commonly seen diseases and population can be developed. The aim of the article is to provide a tutorial on how to develop and validate risk scores based on a virtual dataset by using R software. The dataset we generated including numeric and categorical variables and firstly the numeric variables would be converted to factor variables according to cutoff points identified by the LOESS smoother. Then risk points of each variable, which are related to the coefficients in logistic regression, are assigned to each level of the converted factor variables and other categorical variables. Finally, the total score is calculated for each subject to represent the prediction of the outcome event probability. The original dataset is split into training and validation subsets. Discrimination and calibration are evaluated in the validation subset. R codes with explanations are presented in the main text.
风险评分在临床医学中发挥着重要作用。随着信息技术的进步和电子健康记录的可得性,可以开发针对罕见疾病和人群的评分系统。本文的目的是提供一个教程,介绍如何使用R软件基于虚拟数据集开发和验证风险评分。我们生成的数据集包括数值型和分类变量,首先,数值型变量将根据局部加权散点平滑法(LOESS)确定的截断点转换为因子变量。然后,将与逻辑回归系数相关的每个变量的风险点分配给转换后的因子变量和其他分类变量的每个水平。最后,为每个受试者计算总分,以代表对结局事件概率的预测。原始数据集被分为训练子集和验证子集。在验证子集中评估区分度和校准度。正文中给出了带有解释的R代码。