Bi Wenjian, Kang Guolian, Zhao Yanlong, Cui Yuehua, Yan Song, Li Yun, Cheng Cheng, Pounds Stanley B, Borowitz Michael J, Relling Mary V, Yang Jun J, Liu Zhifa, Pui Ching-Hon, Hunger Stephen P, Hartford Christine M, Leung Wing, Zhang Ji-Feng
Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R.C.
Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, U.S.A.
Ann Hum Genet. 2015 Jul;79(4):294-309. doi: 10.1111/ahg.12117. Epub 2015 May 11.
In genetic association studies of an ordered categorical phenotype, it is usual to either regroup multiple categories of the phenotype into two categories and then apply the logistic regression (LG), or apply ordered logistic (oLG), or ordered probit (oPRB) regression, which accounts for the ordinal nature of the phenotype. However, they may lose statistical power or may not control type I error due to their model assumption and/or instable parameter estimation algorithm when the genetic variant is rare or sample size is limited. To solve this problem, we propose a set-valued (SV) system model to identify genetic variants associated with an ordinal categorical phenotype. We couple this model with a SV system identification algorithm to identify all the key system parameters. Simulations and two real data analyses show that SV and LG accurately controlled the Type I error rate even at a significance level of 10(-6) but not oLG and oPRB in some cases. LG had significantly less power than the other three methods due to disregarding of the ordinal nature of the phenotype, and SV had similar or greater power than oLG and oPRB. We argue that SV should be employed in genetic association studies for ordered categorical phenotype.
在对有序分类表型进行基因关联研究时,通常有以下几种做法:要么将该表型的多个类别重新组合为两类,然后应用逻辑回归(LG);要么应用有序逻辑回归(oLG),或者有序概率回归(oPRB),这些方法考虑了表型的顺序性质。然而,当基因变异罕见或样本量有限时,由于其模型假设和/或不稳定的参数估计算法,这些方法可能会损失统计效力,或者无法控制I型错误。为了解决这个问题,我们提出了一种集值(SV)系统模型,以识别与有序分类表型相关的基因变异。我们将该模型与一种SV系统识别算法相结合,以识别所有关键系统参数。模拟和两项真实数据分析表明,即使在显著性水平为10^(-6)时,SV和LG也能准确控制I型错误率,但在某些情况下oLG和oPRB则不能。由于忽略了表型的顺序性质,LG的效力显著低于其他三种方法,而SV的效力与oLG和oPRB相似或更高。我们认为,在对有序分类表型进行基因关联研究时应采用SV方法。