Jiang Zhenyu, Du Chengan, Jablensky Assen, Liang Hua, Lu Zudi, Ma Yang, Teo Kok Lay
Department of Mathematics and Statistics, Curtin University, Perth, Australia.
Department of Statistics, George Washington University, Washington, D.C., United States of America.
PLoS One. 2014 Oct 17;9(10):e109454. doi: 10.1371/journal.pone.0109454. eCollection 2014.
Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognized as useful in prediction of disease risk. However, how to model the genetic data that is often categorical in disease class prediction is complex and challenging. In this paper, we propose a novel class of nonlinear threshold index logistic models to deal with the complex, nonlinear effects of categorical/discrete SNP covariates for Schizophrenia class prediction. A maximum likelihood methodology is suggested to estimate the unknown parameters in the models. Simulation studies demonstrate that the proposed methodology works viably well for moderate-size samples. The suggested approach is therefore applied to the analysis of the Schizophrenia classification by using a real set of SNP data from Western Australian Family Study of Schizophrenia (WAFSS). Our empirical findings provide evidence that the proposed nonlinear models well outperform the widely used linear and tree based logistic regression models in class prediction of schizophrenia risk with SNP data in terms of both Types I/II error rates and ROC curves.
遗传信息,如单核苷酸多态性(SNP)数据,在疾病风险预测中已被广泛认为是有用的。然而,如何对疾病类别预测中通常为分类数据的遗传数据进行建模是复杂且具有挑战性的。在本文中,我们提出了一类新型的非线性阈值指数逻辑模型,以处理用于精神分裂症类别预测的分类/离散SNP协变量的复杂非线性效应。建议采用最大似然方法来估计模型中的未知参数。模拟研究表明,所提出的方法对于中等规模样本可行且效果良好。因此,所建议的方法被应用于使用来自西澳大利亚精神分裂症家庭研究(WAFSS)的一组真实SNP数据对精神分裂症进行分类分析。我们的实证研究结果表明,在基于SNP数据的精神分裂症风险类别预测中,就I/II型错误率和ROC曲线而言,所提出的非线性模型在性能上明显优于广泛使用的线性和基于树的逻辑回归模型。