• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

全基因组关联研究中的群体分层稳健方法。

Robust methods for population stratification in genome wide association studies.

机构信息

Department of Biostatistics and Programming, Mail Stop 55C-305A, 55 Corporate Drive, Sanofi, Bridgewater, NJ 08807, USA.

出版信息

BMC Bioinformatics. 2013 Apr 19;14:132. doi: 10.1186/1471-2105-14-132.

DOI:10.1186/1471-2105-14-132
PMID:23601181
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3637636/
Abstract

BACKGROUND

Genome-wide association studies can provide novel insights into diseases of interest, as well as to the responsiveness of an individual to specific treatments. In such studies, it is very important to correct for population stratification, which refers to allele frequency differences between cases and controls due to systematic ancestry differences. Population stratification can cause spurious associations if not adjusted properly. The principal component analysis (PCA) method has been relied upon as a highly useful methodology to adjust for population stratification in these types of large-scale studies. Recently, the linear mixed model (LMM) has also been proposed to account for family structure or cryptic relatedness. However, neither of these approaches may be optimal in properly correcting for sample structures in the presence of subject outliers.

RESULTS

We propose to use robust PCA combined with k-medoids clustering to deal with population stratification. This approach can adjust for population stratification for both continuous and discrete populations with subject outliers, and it can be considered as an extension of the PCA method and the multidimensional scaling (MDS) method. Through simulation studies, we compare the performance of our proposed methods with several widely used stratification methods, including PCA and MDS. We show that subject outliers can greatly influence the analysis results from several existing methods, while our proposed robust population stratification methods perform very well for both discrete and admixed populations with subject outliers. We illustrate the new method using data from a rheumatoid arthritis study.

CONCLUSIONS

We demonstrate that subject outliers can greatly influence the analysis result in GWA studies, and propose robust methods for dealing with population stratification that outperform existing population stratification methods in the presence of subject outliers.

摘要

背景

全基因组关联研究可以为感兴趣的疾病以及个体对特定治疗的反应提供新的见解。在这类研究中,校正群体分层非常重要,群体分层是指由于系统的祖先差异,病例和对照之间等位基因频率的差异。如果不进行适当调整,群体分层可能会导致虚假关联。主成分分析(PCA)方法已被广泛用作调整这些大规模研究中群体分层的高度有用的方法。最近,线性混合模型(LMM)也被提出用于解释家族结构或隐藏的相关性。然而,在存在个体离群值的情况下,这些方法都可能无法很好地校正样本结构。

结果

我们建议使用稳健 PCA 与 k-中心点聚类相结合来处理群体分层。这种方法可以调整存在个体离群值的连续和离散人群的群体分层,它可以被视为 PCA 方法和多维尺度(MDS)方法的扩展。通过模拟研究,我们将我们提出的方法与几种广泛使用的分层方法(包括 PCA 和 MDS)的性能进行了比较。我们表明,个体离群值会极大地影响几种现有方法的分析结果,而我们提出的稳健的群体分层方法在存在个体离群值的离散和混合人群中表现非常出色。我们使用类风湿关节炎研究的数据说明了新方法。

结论

我们证明了个体离群值会极大地影响 GWA 研究中的分析结果,并提出了稳健的方法来处理群体分层,这些方法在存在个体离群值的情况下优于现有的群体分层方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db55/3637636/4ccd0f883370/1471-2105-14-132-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db55/3637636/befb98518af8/1471-2105-14-132-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db55/3637636/e880b991e8dd/1471-2105-14-132-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db55/3637636/4ccd0f883370/1471-2105-14-132-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db55/3637636/befb98518af8/1471-2105-14-132-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db55/3637636/e880b991e8dd/1471-2105-14-132-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db55/3637636/4ccd0f883370/1471-2105-14-132-3.jpg

相似文献

1
Robust methods for population stratification in genome wide association studies.全基因组关联研究中的群体分层稳健方法。
BMC Bioinformatics. 2013 Apr 19;14:132. doi: 10.1186/1471-2105-14-132.
2
Clustering by genetic ancestry using genome-wide SNP data.基于全基因组 SNP 数据的遗传谱系聚类分析。
BMC Genet. 2010 Dec 9;11:108. doi: 10.1186/1471-2156-11-108.
3
Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.在存在亲缘关系的情况下,对群体结构进行稳健推断,以进行血统预测和分层校正。
Genet Epidemiol. 2015 May;39(4):276-93. doi: 10.1002/gepi.21896. Epub 2015 Mar 23.
4
Evaluation of methods for adjusting population stratification in genome-wide association studies: Standard versus categorical principal component analysis.全基因组关联研究中调整群体分层方法的评估:标准主成分分析与分类主成分分析
Ann Hum Genet. 2019 Nov;83(6):454-464. doi: 10.1111/ahg.12339. Epub 2019 Jul 19.
5
New approaches to population stratification in genome-wide association studies.全基因组关联研究中群体分层的新方法。
Nat Rev Genet. 2010 Jul;11(7):459-63. doi: 10.1038/nrg2813.
6
Evaluation of population stratification adjustment using genome-wide or exonic variants.基于全基因组或外显子变异进行群体分层调整的评估。
Genet Epidemiol. 2020 Oct;44(7):702-716. doi: 10.1002/gepi.22332. Epub 2020 Jun 30.
7
Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements?结构化样本关联分析中的主成分回归与线性混合模型:竞争对手还是互补方法?
Genet Epidemiol. 2015 Mar;39(3):149-55. doi: 10.1002/gepi.21879. Epub 2014 Dec 23.
8
Accounting for population stratification in practice: a comparison of the main strategies dedicated to genome-wide association studies.在实践中考虑群体分层:专门用于全基因组关联研究的主要策略的比较。
PLoS One. 2011;6(12):e28845. doi: 10.1371/journal.pone.0028845. Epub 2011 Dec 21.
9
Adjusting for population stratification in a fine scale with principal components and sequencing data.利用主成分分析和测序数据精细调整群体分层。
Genet Epidemiol. 2013 Dec;37(8):787-801. doi: 10.1002/gepi.21764. Epub 2013 Oct 5.
10
Novel genetic matching methods for handling population stratification in genome-wide association studies.用于处理全基因组关联研究中群体分层的新型基因匹配方法。
BMC Bioinformatics. 2015 Mar 14;16:84. doi: 10.1186/s12859-015-0521-4.

引用本文的文献

1
Genome-wide association studies revealed partial genetic links between early vigour and precocity in macadamia.全基因组关联研究揭示了澳洲坚果早期活力与早熟之间的部分遗传联系。
Hortic Res. 2025 Jul 4;12(9):uhaf162. doi: 10.1093/hr/uhaf162. eCollection 2025 Sep.
2
GWAS advancements to investigate disease associations and biological mechanisms.全基因组关联研究(GWAS)在探究疾病关联和生物学机制方面的进展。
Clin Transl Discov. 2024 Jul;4(3). doi: 10.1002/ctd2.296. Epub 2024 May 1.
3
Genome-wide association study on abdomen depth, head width, hip width, and withers height in native cattle of Guilan (Bos indicus).

本文引用的文献

1
Outlier detection in multivariate analytical chemical data.多元分析化学数据中的异常值检测
Anal Chem. 1998 Jun 1;70(11):2372-9. doi: 10.1021/ac970763d.
2
Meta-analysis of genome-wide association studies in celiac disease and rheumatoid arthritis identifies fourteen non-HLA shared loci.乳糜泻和类风湿关节炎全基因组关联研究的荟萃分析确定了 14 个非 HLA 共享位点。
PLoS Genet. 2011 Feb;7(2):e1002004. doi: 10.1371/journal.pgen.1002004. Epub 2011 Feb 24.
3
Unraveling the genetic basis of asthma and allergic diseases.揭示哮喘和过敏性疾病的遗传基础。
对源于吉兰省(印度牛)的本地牛的腹部深度、头部宽度、臀部宽度和肩部高度进行全基因组关联研究。
PLoS One. 2023 Aug 18;18(8):e0289612. doi: 10.1371/journal.pone.0289612. eCollection 2023.
4
Identification of novel putative alleles related to important agronomic traits of wheat using robust strategies in GWAS.利用 GWAS 中的稳健策略鉴定与小麦重要农艺性状相关的新型潜在等位基因。
Sci Rep. 2023 Jun 19;13(1):9927. doi: 10.1038/s41598-023-36134-z.
5
Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks.利用图像识别处理生物样本库中遗传疾病的不平衡数据。
Front Genet. 2022 Feb 7;13:822117. doi: 10.3389/fgene.2022.822117. eCollection 2022.
6
A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant.一种整合层次聚类和加权组合的新方法用于多表型与一个基因变异的关联研究
Front Genet. 2021 Jun 17;12:654804. doi: 10.3389/fgene.2021.654804. eCollection 2021.
7
Robustification of GWAS to explore effective SNPs addressing the challenges of hidden population stratification and polygenic effects.稳健化 GWAS 以探索有效的 SNPs,解决潜在人群分层和多基因效应的挑战。
Sci Rep. 2021 Jun 22;11(1):13060. doi: 10.1038/s41598-021-90774-7.
8
Recommendations for Choosing the Genotyping Method and Best Practices for Quality Control in Crop Genome-Wide Association Studies.作物全基因组关联研究中基因分型方法的选择建议及质量控制的最佳实践
Front Genet. 2020 Jun 5;11:447. doi: 10.3389/fgene.2020.00447. eCollection 2020.
9
Benefits and limitations of genome-wide association studies.全基因组关联研究的优势和局限性。
Nat Rev Genet. 2019 Aug;20(8):467-484. doi: 10.1038/s41576-019-0127-1.
10
IPCAPS: an R package for iterative pruning to capture population structure.IPCAPS:一个用于迭代剪枝以捕捉群体结构的R包。
Source Code Biol Med. 2019 Mar 20;14:2. doi: 10.1186/s13029-019-0072-6. eCollection 2019.
Allergy Asthma Immunol Res. 2010 Oct;2(4):215-27. doi: 10.4168/aair.2010.2.4.215. Epub 2010 Jun 11.
4
New approaches to population stratification in genome-wide association studies.全基因组关联研究中群体分层的新方法。
Nat Rev Genet. 2010 Jul;11(7):459-63. doi: 10.1038/nrg2813.
5
Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci.全基因组关联研究荟萃分析确定了七个新的类风湿关节炎风险位点。
Nat Genet. 2010 Jun;42(6):508-14. doi: 10.1038/ng.582. Epub 2010 May 9.
6
Mixed linear model approach adapted for genome-wide association studies.混合线性模型方法适用于全基因组关联研究。
Nat Genet. 2010 Apr;42(4):355-60. doi: 10.1038/ng.546. Epub 2010 Mar 7.
7
Variance component model to account for sample structure in genome-wide association studies.用于全基因组关联研究中样本结构的方差成分模型。
Nat Genet. 2010 Apr;42(4):348-54. doi: 10.1038/ng.548. Epub 2010 Mar 7.
8
Genome-wide association study reveals multiple nasopharyngeal carcinoma-associated loci within the HLA region at chromosome 6p21.3.全基因组关联研究揭示了位于6号染色体p21.3区域HLA区内多个与鼻咽癌相关的基因座。
Am J Hum Genet. 2009 Aug;85(2):194-203. doi: 10.1016/j.ajhg.2009.07.007. Epub 2009 Aug 6.
9
Genome-wide association study identifies three loci associated with melanoma risk.全基因组关联研究确定了三个与黑色素瘤风险相关的基因座。
Nat Genet. 2009 Aug;41(8):920-5. doi: 10.1038/ng.411. Epub 2009 Jul 5.
10
REL, encoding a member of the NF-kappaB family of transcription factors, is a newly defined risk locus for rheumatoid arthritis.REL基因编码转录因子NF-κB家族的一个成员,是类风湿性关节炎新定义的风险基因座。
Nat Genet. 2009 Jul;41(7):820-3. doi: 10.1038/ng.395. Epub 2009 Jun 7.