Won Sungho, Choi Hosik, Park Suyeon, Lee Juyoung, Park Changyi, Kwon Sunghoon
Department of Public Health Science, Seoul National University, Seoul, Republic of Korea.
Department of Applied Information Statistics, Kyonggi University, Suwon, Republic of Korea.
Biomed Res Int. 2015;2015:605891. doi: 10.1155/2015/605891. Epub 2015 Aug 4.
Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.
由于基因分型技术最近的改进,大规模遗传数据可用于识别疾病易感基因座,这一成功发现极大地增进了我们对复杂疾病的理解。然而,尽管取得了这些成功,但许多复杂疾病的大多数遗传效应都非常小,这成为构建疾病预测模型的一大障碍。最近,人们提出了许多基于惩罚回归的统计方法来解决所谓的“大P小N”问题。包括最小绝对收缩选择算子(LASSO)和岭回归在内的惩罚回归限制了参数空间,这种约束使得能够估计大量单核苷酸多态性(SNP)的效应。已经提出了各种扩展方法,在本报告中,我们将它们应用于几种复杂疾病来比较其准确性。我们的结果表明,惩罚回归通常具有稳健性,并且对于至少所考虑的疾病而言,比现有方法提供了更高的准确性。