Zhu Meng, Cheng Yang, Dai Juncheng, Xie Lan, Jin Guangfu, Ma Hongxia, Hu Zhibin, Shi Yongyong, Lin Dongxin, Shen Hongbing
Department of Epidemiology, School of Public Health, Nanjing Medical University, Nanjing 211166, China.
Medical Systems Biology Research Center, Tsinghua University School of Medicine.
Zhonghua Liu Xing Bing Xue Za Zhi. 2015 Oct;36(10):1047-52.
To evaluate the predictive power of risk model by combining traditional epidemiological factors and genetic factors.
Our previous GWAS data of lung cancer in Chinese were used in training set (Nanjing and Shanghai: 1473 cases vs. 1962 control) and testing set (Beijing and Wuhan: 858 cases vs. 1 115 control). All the single nucleotide polymorphisms (SNPs) associated with lung cancer risk were systematically selected and stepwise logistic regression analysis was used to select independent factors in the training set. The wGRS (weighted genetic score) was further used to calculate genetic risk score. To evaluate the contribution of the genetic factors, 3 risk models were established by using the training set, i.e. smoking model (based on smoking status) , genetic risk model (based on genetic risk score) and combined model (based on smoke and genetic risk score). The predictability of the models were evaluated by the areas under the receiver operating characteristic (ROC) curves, area under curve (AUC), net reclassification improvement (NRI) and integrated discrimination index (IDI). Besides, the results were further verified in the testing set.
In the training set, it was found that the AUC of the smoking, genetic risk and combined models were 0.65 (0.63-0.66), 0.60 (0.59-0.62) and 0.69 (0.67-0.71), respectively. Compared with combined model, the predictive power of other two models significantly declined, the difference was statistically significant (P<0.001). Furthermore, compared with the smoking model, the NRI of the combined model increased by 4.57% (2.23%-6.91%) and IDI increased by 3.11% (2.52%-3.69%) in the training set, the difference was statistically significant (P<0.001). Similarly, in the testing set NRI increased by 2.77%, the difference was not statistically significant (P=0.069) , and IDI increased by 3.16%, the difference was statistically significant (P<0.001).
This study showed that combining 14 genetic variants with traditional epidemiological factors could improve the predictive power of risk model for lung cancer. The model could be used in the screening of high-risk population of lung cancer in Chinese and provide evidence for the early diagnosis and treatment of lung cancer.
通过结合传统流行病学因素和遗传因素来评估风险模型的预测能力。
我们之前在中国进行的肺癌全基因组关联研究(GWAS)数据被用于训练集(南京和上海:1473例病例对1962例对照)和测试集(北京和武汉:858例病例对1115例对照)。系统地选择了所有与肺癌风险相关的单核苷酸多态性(SNP),并在训练集中使用逐步逻辑回归分析来选择独立因素。进一步使用加权遗传评分(wGRS)来计算遗传风险评分。为了评估遗传因素的贡献,利用训练集建立了3种风险模型,即吸烟模型(基于吸烟状况)、遗传风险模型(基于遗传风险评分)和联合模型(基于吸烟和遗传风险评分)。通过受试者操作特征(ROC)曲线下面积(AUC)、净重新分类改善(NRI)和综合判别指数(IDI)来评估模型的预测能力。此外,在测试集中对结果进行了进一步验证。
在训练集中,发现吸烟模型、遗传风险模型和联合模型的AUC分别为0.65(0.63 - 0.66)、0.60(0.59 - 0.62)和0.69(0.67 - 0.71)。与联合模型相比,其他两个模型的预测能力显著下降,差异具有统计学意义(P < 0.001)。此外,在训练集中,与吸烟模型相比,联合模型的NRI增加了4.57%(2.2