Giachino Daniela F, Regazzoni Silvia, Bardessono Marco, De Marchi Mario, Gregori Dario
Medical Genetics Unit, Department of Clmical and Biological Sciences, University of Torino, Torino, Italy.
Curr Med Res Opin. 2007 Jul;23(7):1657-65. doi: 10.1185/030079907x210471.
To evaluate to what extent an inefficient statistical model affects the study of genetic factors in extra-intestinal manifestations of Crohn's disease (CD) and how clinical predictions can be improved using more adequate techniques.
Extra-intestinal manifestations were studied in 152 CD patients. Three sets of variables were considered: (1) disease characteristics--presentation, behavior, location; (2) generic risk factors--age, gender, smoke and familiarity; and (3) genetic polymorphisms of the NOD2, CD14, TNF, IL12B, and IL1RN genes, whose involvement in CD is known or suspected.
Six statistical classifiers and data mining models were applied: (1) logistic regression as a benchmark; (2) generalized additive model; (3) projection pursuit regression; (4) linear discriminant analysis, (5) quadratic discriminant analysis; (6) artificial neural networks one-layer feed forward. Models were selected using the Akaike Information criterion and their accuracy was compared with several indexes.
Extra-intestinal manifestations occurred in 75 patients. The model with clinical variables only selected familiarity, gender, presentation, and behavior as significantly associated with extra-intestinal manifestations, whereas when the genetic factors were also included familiarity was no longer significant, being replaced by the NOD2, TNF, and IL12B single nucleotide polymorphisms. The projection pursuit regression performed best in predicting individual outcomes (Kappa statistics 0.078 [SE 0.09] without and 0.108 [SE 0.075] with genetic information). One-layer artificial neural networks did not show any particular improvement in terms of model accuracy over nonlinear techniques.
The correct identification of factors associated with extra-intestinal symptoms in CD, in particular the genetic ones, is highly dependent on the model chosen for the analysis. By using the most sophisticated statistical models, the accuracy of prediction can be strengthened by 10-64%, compared with linear regression.
评估低效统计模型在多大程度上影响克罗恩病(CD)肠外表现的遗传因素研究,以及如何使用更合适的技术改进临床预测。
对152例CD患者的肠外表现进行研究。考虑了三组变量:(1)疾病特征——表现形式、行为、部位;(2)一般风险因素——年龄、性别、吸烟和家族史;(3)NOD2、CD14、TNF、IL12B和IL1RN基因的基因多态性,已知或怀疑这些基因与CD有关。
应用六种统计分类器和数据挖掘模型:(1)逻辑回归作为基准;(2)广义相加模型;(3)投影寻踪回归;(4)线性判别分析;(5)二次判别分析;(6)单层前馈人工神经网络。使用赤池信息准则选择模型,并将其准确性与多个指标进行比较。
75例患者出现肠外表现。仅包含临床变量的模型选择家族史、性别、表现形式和行为与肠外表现显著相关,而当纳入遗传因素时,家族史不再显著,取而代之的是NOD2、TNF和IL12B单核苷酸多态性。投影寻踪回归在预测个体结果方面表现最佳(无遗传信息时Kappa统计量为0.078[标准误0.09],有遗传信息时为0.108[标准误0.075])。单层人工神经网络在模型准确性方面与非线性技术相比没有显示出任何特别的改进。
CD肠外症状相关因素的正确识别,尤其是遗传因素,高度依赖于所选的分析模型。与线性回归相比,使用最复杂的统计模型可将预测准确性提高10%-64%。