利用大规模遗传数据进行疾病预测的惩罚性和非惩罚性方法评估。

Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data.

作者信息

Won Sungho, Choi Hosik, Park Suyeon, Lee Juyoung, Park Changyi, Kwon Sunghoon

机构信息

Department of Public Health Science, Seoul National University, Seoul, Republic of Korea.

Department of Applied Information Statistics, Kyonggi University, Suwon, Republic of Korea.

出版信息

Biomed Res Int. 2015;2015:605891. doi: 10.1155/2015/605891. Epub 2015 Aug 4.

DOI:10.1155/2015/605891

PMID:26346893

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4539442/

Abstract

Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.

摘要

由于基因分型技术最近的改进，大规模遗传数据可用于识别疾病易感基因座，这一成功发现极大地增进了我们对复杂疾病的理解。然而，尽管取得了这些成功，但许多复杂疾病的大多数遗传效应都非常小，这成为构建疾病预测模型的一大障碍。最近，人们提出了许多基于惩罚回归的统计方法来解决所谓的“大P小N”问题。包括最小绝对收缩选择算子（LASSO）和岭回归在内的惩罚回归限制了参数空间，这种约束使得能够估计大量单核苷酸多态性（SNP）的效应。已经提出了各种扩展方法，在本报告中，我们将它们应用于几种复杂疾病来比较其准确性。我们的结果表明，惩罚回归通常具有稳健性，并且对于至少所考虑的疾病而言，比现有方法提供了更高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7116/4539442/04b35c8b2355/BMRI2015-605891.001.jpg

相似文献

Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data.

Biomed Res Int. 2015;2015:605891. doi: 10.1155/2015/605891. Epub 2015 Aug 4.

Lost in Translation: On the Problem of Data Coding in Penalized Whole Genome Regression with Interactions.

G3 (Bethesda). 2019 Apr 9;9(4):1117-1129. doi: 10.1534/g3.118.200961.

Simultaneous estimation of gene-gene and gene-environment interactions for numerous loci using double penalized log-likelihood.

Genet Epidemiol. 2006 Dec;30(8):645-51. doi: 10.1002/gepi.20176.

Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease.

Genet Epidemiol. 2013 Feb;37(2):184-95. doi: 10.1002/gepi.21698. Epub 2012 Nov 30.

Multilocus association testing with penalized regression.

Genet Epidemiol. 2011 Dec;35(8):755-65. doi: 10.1002/gepi.20625. Epub 2011 Sep 15.

Differential gene expression detection and sample classification using penalized linear regression models.

Bioinformatics. 2006 Feb 15;22(4):472-6. doi: 10.1093/bioinformatics/bti827. Epub 2005 Dec 13.

Improving LASSO performance for Grey Leaf Spot disease resistance prediction based on genotypic data by considering all possible two-way SNP interactions.

Integr Biol (Camb). 2012 May;4(5):564-7. doi: 10.1039/c2ib00004k. Epub 2012 Apr 2.

Prediction of Quantitative Traits Using Common Genetic Variants: Application to Body Mass Index.

Genomics Inform. 2016 Dec;14(4):149-159. doi: 10.5808/GI.2016.14.4.149. Epub 2016 Dec 30.

The LASSO and sparse least square regression methods for SNP selection in predicting quantitative traits.

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):629-36. doi: 10.1109/TCBB.2011.139. Epub 2011 Oct 17.

Optimism Bias Correction in Omics Studies with Big Data: Assessment of Penalized Methods on Simulated Data.

OMICS. 2019 Apr;23(4):207-213. doi: 10.1089/omi.2018.0191. Epub 2019 Feb 22.

引用本文的文献

Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES).

BMC Bioinformatics. 2024 Feb 2;25(1):56. doi: 10.1186/s12859-024-05677-x.

Evaluation of polygenic risk scores for ovarian cancer risk prediction in a prospective cohort study.

J Med Genet. 2018 Aug;55(8):546-554. doi: 10.1136/jmedgenet-2018-105313. Epub 2018 May 5.

Improving Disease Prediction by Incorporating Family Disease History in Risk Prediction Models with Large-Scale Genetic Data.

Genetics. 2017 Nov;207(3):1147-1155. doi: 10.1534/genetics.117.300283. Epub 2017 Sep 12.

本文引用的文献

MultiBLUP: improved SNP-based prediction for complex traits.

Genome Res. 2014 Sep;24(9):1550-7. doi: 10.1101/gr.169375.113. Epub 2014 Jun 24.

CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION.

Ann Stat. 2013 Oct 1;41(5):2505-2536. doi: 10.1214/13-AOS1159.

Estimation of SNP heritability from dense genotype data.

Am J Hum Genet. 2013 Dec 5;93(6):1151-5. doi: 10.1016/j.ajhg.2013.10.015.

Novel genetic analysis for case-control genome-wide association studies: quantification of power and genomic prediction accuracy.

PLoS One. 2013 Aug 19;8(8):e71494. doi: 10.1371/journal.pone.0071494. eCollection 2013.

Power and predictive accuracy of polygenic risk scores.

PLoS Genet. 2013 Mar;9(3):e1003348. doi: 10.1371/journal.pgen.1003348. Epub 2013 Mar 21.

Polygenic modeling with bayesian sparse linear mixed models.

PLoS Genet. 2013;9(2):e1003264. doi: 10.1371/journal.pgen.1003264. Epub 2013 Feb 7.

Phenotype prediction from genome-wide association studies: application to smoking behaviors.

BMC Syst Biol. 2012;6 Suppl 2(Suppl 2):S11. doi: 10.1186/1752-0509-6-S2-S11. Epub 2012 Dec 12.

Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis.

Hum Mol Genet. 2013 Feb 15;22(4):832-41. doi: 10.1093/hmg/dds491. Epub 2012 Nov 28.

RANDOM LASSO.

Ann Appl Stat. 2011 Mar 1;5(1):468-485. doi: 10.1214/10-AOAS377.

Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood.

Bioinformatics. 2012 Oct 1;28(19):2540-2. doi: 10.1093/bioinformatics/bts474. Epub 2012 Jul 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用大规模遗传数据进行疾病预测的惩罚性和非惩罚性方法评估。

Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献