• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

适用于不平衡病例对照疾病数据的全基因组关联研究的统计学习方法。

Statistical Learning Methods Applicable to Genome-Wide Association Studies on Unbalanced Case-Control Disease Data.

机构信息

Department of Mathematical Sciences, SUNY Binghamton University, Vestal, NY 13850, USA.

出版信息

Genes (Basel). 2021 May 13;12(5):736. doi: 10.3390/genes12050736.

DOI:10.3390/genes12050736
PMID:34068248
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8153154/
Abstract

Despite the fact that imbalance between case and control groups is prevalent in genome-wide association studies (GWAS), it is often overlooked. This imbalance is getting more significant and urgent as the rapid growth of biobanks and electronic health records have enabled the collection of thousands of phenotypes from large cohorts, in particular for diseases with low prevalence. The unbalanced binary traits pose serious challenges to traditional statistical methods in terms of both genomic selection and disease prediction. For example, the well-established linear mixed models (LMM) yield inflated type I error rates in the presence of unbalanced case-control ratios. In this article, we review multiple statistical approaches that have been developed to overcome the inaccuracy caused by the unbalanced case-control ratio, with the advantages and limitations of each approach commented. In addition, we also explore the potential for applying several powerful and popular state-of-the-art machine-learning approaches, which have not been applied to the GWAS field yet. This review paves the way for better analysis and understanding of the unbalanced case-control disease data in GWAS.

摘要

尽管病例对照组之间的不平衡在全基因组关联研究(GWAS)中很常见,但往往被忽视。随着生物库和电子健康记录的快速增长,使得从大型队列中收集数千种表型成为可能,特别是对于患病率较低的疾病,这种不平衡变得更加显著和紧迫。不平衡的二元特征对传统的统计方法在基因组选择和疾病预测方面都提出了严峻的挑战。例如,在存在不平衡的病例对照比例的情况下,成熟的线性混合模型(LMM)会导致过高的Ⅰ型错误率。在本文中,我们综述了多种已开发的统计方法,这些方法旨在克服病例对照比例不平衡所带来的不准确性,并对每种方法的优缺点进行了评论。此外,我们还探讨了应用几种强大且流行的最新机器学习方法的可能性,这些方法尚未应用于 GWAS 领域。本综述为更好地分析和理解 GWAS 中不平衡的病例对照疾病数据铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df5f/8153154/09e4065464c2/genes-12-00736-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df5f/8153154/09e4065464c2/genes-12-00736-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df5f/8153154/09e4065464c2/genes-12-00736-g001.jpg

相似文献

1
Statistical Learning Methods Applicable to Genome-Wide Association Studies on Unbalanced Case-Control Disease Data.适用于不平衡病例对照疾病数据的全基因组关联研究的统计学习方法。
Genes (Basel). 2021 May 13;12(5):736. doi: 10.3390/genes12050736.
2
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.在大规模的遗传关联研究中,有效地控制病例-对照不平衡和样本相关性。
Nat Genet. 2018 Sep;50(9):1335-1341. doi: 10.1038/s41588-018-0184-y. Epub 2018 Aug 13.
3
Joint analysis of multiple phenotypes for extremely unbalanced case-control association studies.极端不平衡病例对照关联研究的多表型联合分析。
Genet Epidemiol. 2023 Mar;47(2):185-197. doi: 10.1002/gepi.22513. Epub 2023 Jan 24.
4
GAPIT Version 2: An Enhanced Integrated Tool for Genomic Association and Prediction.GAPIT 版本 2:一个用于基因组关联和预测的增强型综合工具。
Plant Genome. 2016 Jul;9(2). doi: 10.3835/plantgenome2015.11.0120.
5
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
6
ACID: Association Correction for Imbalanced Data in GWAS.ACID:GWAS 中不平衡数据的关联校正。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):316-322. doi: 10.1109/TCBB.2016.2608819. Epub 2016 Sep 13.
7
Computationally efficient whole-genome regression for quantitative and binary traits.计算效率高的全基因组回归分析用于定量和二项性状。
Nat Genet. 2021 Jul;53(7):1097-1103. doi: 10.1038/s41588-021-00870-7. Epub 2021 May 20.
8
A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS.一种用于二元表型检测的快速准确算法及其在全表型组关联研究中的应用
Am J Hum Genet. 2017 Jul 6;101(1):37-49. doi: 10.1016/j.ajhg.2017.05.014. Epub 2017 Jun 8.
9
Robust meta-analysis of biobank-based genome-wide association studies with unbalanced binary phenotypes.基于生物库的不平衡二分类表型全基因组关联研究的稳健荟萃分析。
Genet Epidemiol. 2019 Jul;43(5):462-476. doi: 10.1002/gepi.22197. Epub 2019 Feb 22.
10
GWAS with longitudinal phenotypes: performance of approximate procedures.具有纵向表型的全基因组关联研究:近似方法的性能
Eur J Hum Genet. 2015 Oct;23(10):1384-91. doi: 10.1038/ejhg.2015.1. Epub 2015 Feb 25.

引用本文的文献

1
Confirmation of HLA-II associations with TB susceptibility in admixed African samples.在非洲混合样本中证实HLA-II与结核病易感性的关联。
Elife. 2025 Jun 3;13:RP99200. doi: 10.7554/eLife.99200.
2
Mathematical bounds on r and the effect size in case-control genome-wide association studies.病例对照全基因组关联研究中r和效应大小的数学界限。
Theor Popul Biol. 2025 Aug;164:1-11. doi: 10.1016/j.tpb.2025.04.003. Epub 2025 May 15.
3
Mathematical bounds on and the effect size in case-control genome-wide association studies.病例对照全基因组关联研究中 的数学界限及效应大小

本文引用的文献

1
A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.一种快速且可扩展的大规模超高维稀疏回归框架及其在 UK Biobank 中的应用。
PLoS Genet. 2020 Oct 23;16(10):e1009141. doi: 10.1371/journal.pgen.1009141. eCollection 2020 Oct.
2
Detecting PCOS susceptibility loci from genome-wide association studies via iterative trend correlation based feature screening.通过基于迭代趋势相关性的特征筛选从全基因组关联研究中检测 PCOS 易感性基因座。
BMC Bioinformatics. 2020 May 4;21(1):177. doi: 10.1186/s12859-020-3492-z.
3
Extreme sampling design in genetic association mapping of quantitative trait loci using balanced and unbalanced case-control samples.
bioRxiv. 2024 Dec 17:2024.12.17.628943. doi: 10.1101/2024.12.17.628943.
4
A review of model evaluation metrics for machine learning in genetics and genomics.遗传学和基因组学中机器学习模型评估指标综述。
Front Bioinform. 2024 Sep 10;4:1457619. doi: 10.3389/fbinf.2024.1457619. eCollection 2024.
5
Admixture Mapping of Chronic Kidney Disease and Risk Factors in Hispanic/Latino Individuals From Central America Country of Origin.中美洲原籍的西班牙裔/拉丁裔个体的慢性肾脏病及其危险因素的混合映射。
Circ Genom Precis Med. 2024 Aug;17(4):e004314. doi: 10.1161/CIRCGEN.123.004314. Epub 2024 Jul 1.
6
Genome-Wide Association Study of Growth and Sex Traits Provides Insight into Heritable Mechanisms Underlying Growth Development of (Oriental River Prawn).克氏原螯虾生长和性别性状的全基因组关联研究为其生长发育的遗传机制提供了见解
Biology (Basel). 2023 Mar 10;12(3):429. doi: 10.3390/biology12030429.
利用平衡和不平衡病例对照样本进行数量性状基因座遗传关联作图的极端抽样设计。
Sci Rep. 2019 Oct 29;9(1):15504. doi: 10.1038/s41598-019-51790-w.
4
Cardiovascular disease risk prediction models: challenges and perspectives.心血管疾病风险预测模型:挑战与展望
Lancet Glob Health. 2019 Oct;7(10):e1288-e1289. doi: 10.1016/S2214-109X(19)30365-1. Epub 2019 Sep 2.
5
Genetic correlations of polygenic disease traits: from theory to practice.多基因疾病性状的遗传相关性:从理论到实践。
Nat Rev Genet. 2019 Oct;20(10):567-581. doi: 10.1038/s41576-019-0137-z.
6
Association and gene-gene interaction analyses for polymorphic variants in CTLA-4 and FOXP3 genes: role in susceptibility to autoimmune thyroid disease.CTLA-4 和 FOXP3 基因多态性变异与自身免疫性甲状腺疾病易感性的关联及基因-基因相互作用分析。
Endocrine. 2019 Jun;64(3):591-604. doi: 10.1007/s12020-019-01859-3. Epub 2019 Feb 15.
7
Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico.真实世界中的稀有变异关联分析场景:不平衡和样本量对计算机模拟功效的影响。
BMC Bioinformatics. 2019 Jan 22;20(1):46. doi: 10.1186/s12859-018-2591-6.
8
The illusion of polygenic disease risk prediction.多基因疾病风险预测的幻象。
Genet Med. 2019 Aug;21(8):1705-1707. doi: 10.1038/s41436-018-0418-5. Epub 2019 Jan 12.
9
Bayesian multiple logistic regression for case-control GWAS.贝叶斯多项逻辑回归用于病例对照 GWAS。
PLoS Genet. 2018 Dec 31;14(12):e1007856. doi: 10.1371/journal.pgen.1007856. eCollection 2018 Dec.
10
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.在大规模的遗传关联研究中,有效地控制病例-对照不平衡和样本相关性。
Nat Genet. 2018 Sep;50(9):1335-1341. doi: 10.1038/s41588-018-0184-y. Epub 2018 Aug 13.