PopCluster：一种识别具有种族依赖性效应的基因变异的算法。

PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects.

作者信息

Gurinovich Anastasia, Bae Harold, Farrell John J, Andersen Stacy L, Monti Stefano, Puca Annibale, Atzmon Gil, Barzilai Nir, Perls Thomas T, Sebastiani Paola

机构信息

Bioinformatics Program, Boston University, Boston, MA, USA.

College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA.

出版信息

Bioinformatics. 2019 Sep 1;35(17):3046-3054. doi: 10.1093/bioinformatics/btz017.

DOI:10.1093/bioinformatics/btz017

PMID:30624692

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6735784/

Abstract

MOTIVATION

Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery.

RESULTS

In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype.

AVAILABILITY AND IMPLEMENTATION

PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在过去十年中，全基因组关联研究纳入了更多样化的人群。如果一个基因变异在不同人群中对表型有不同影响，那么将全基因组关联研究应用于整个数据集可能无法找出这些差异。在最终会导致诊断测试开发或药物发现的研究中，能够识别基因变异的人群特异性效应尤为重要。

结果

在本文中，我们提出了PopCluster算法：一种自动发现基因变异的遗传效应在统计学上存在差异的个体子集的算法。PopCluster提供了一个简单的框架，无需事先了解受试者的种族即可直接分析基因型数据。PopCluster结合了逻辑回归建模、主成分分析、层次聚类和递归自底向上的树解析程序。对PopCluster的评估表明，在病例和对照之间等位基因频率差异很大的模拟中，该算法具有稳定的低假阳性率（约4%）和高真阳性率（>80%）。将PopCluster应用于长寿基因研究的数据，发现rs3764814（USP42）与该表型的关联存在种族依赖性异质性。

可用性和实现方式

PopCluster使用R编程语言、PLINK和Eigensoft软件实现，可在以下GitHub存储库中找到：https://github.com/gurinovich/PopCluster ，其中包含其安装和使用说明。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects.PopCluster：一种识别具有种族依赖性效应的基因变异的算法。

Bioinformatics. 2019 Sep 1;35(17):3046-3054. doi: 10.1093/bioinformatics/btz017.

PopCluster: A Population Genetics Model-Based Toolset for Simulating, Inferring and Visualising Individual Admixture and Population Structure.

Mol Ecol Resour. 2024 Dec 26:e14058. doi: 10.1111/1755-0998.14058.

Varying Effects of APOE Alleles on Extreme Longevity in European Ethnicities.APOE 等位基因对欧洲裔人群极端长寿的影响各异。

J Gerontol A Biol Sci Med Sci. 2019 Nov 13;74(Suppl_1):S45-S51. doi: 10.1093/gerona/glz179.

vcf2gwas: Python API for comprehensive GWAS analysis using GEMMA.vcf2gwas：使用 GEMMA 进行全面 GWAS 分析的 Python API。

Bioinformatics. 2022 Jan 12;38(3):839-840. doi: 10.1093/bioinformatics/btab710.

Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies.用于全基因组关联研究分层校正的空间遗传血统新型概率模型。

Bioinformatics. 2017 Mar 15;33(6):879-885. doi: 10.1093/bioinformatics/btw720.

Unified representation of genetic variants.基因变异的统一表示

Bioinformatics. 2015 Jul 1;31(13):2202-4. doi: 10.1093/bioinformatics/btv112. Epub 2015 Feb 19.

High-throughput and efficient multilocus genome-wide association study on longitudinal outcomes.高通量且高效的多基因座全基因组关联研究纵向结局。

Bioinformatics. 2020 May 1;36(10):3004-3010. doi: 10.1093/bioinformatics/btaa120.

PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data.PSReliP：一个基于全基因组遗传变异数据的分析和可视化群体结构及亲缘关系的集成分析工具。

BMC Bioinformatics. 2023 Apr 5;24(1):135. doi: 10.1186/s12859-023-05169-4.

PALM: a powerful and adaptive latent model for prioritizing risk variants with functional annotations.PALM：一种强大且自适应的潜在模型，用于对具有功能注释的风险变异进行优先级排序。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad068.

IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.IGESS：一种在全基因组关联研究中整合个体水平基因型数据和汇总统计数据的统计方法。

Bioinformatics. 2017 Sep 15;33(18):2882-2889. doi: 10.1093/bioinformatics/btx314.

引用本文的文献

Development of a 50K SNP array for whole-genome analysis and its application in the genetic localization of eggplant ( L.) fruit shape.用于全基因组分析的50K单核苷酸多态性（SNP）芯片的开发及其在茄子果实形状基因定位中的应用

Front Plant Sci. 2024 Nov 25;15:1492242. doi: 10.3389/fpls.2024.1492242. eCollection 2024.

Race and ethnicity dynamics in survival to 100 years in the United States.美国活到100岁人群中的种族和族裔动态。

J Intern Med. 2025 Jan;297(1):2-21. doi: 10.1111/joim.20031. Epub 2024 Dec 4.

Population structure analysis of Phlebotomus papatasi populations using transcriptome microsatellites: possible implications for leishmaniasis control and vaccine development.利用转录组微卫星进行白蛉埃及亚种种群结构分析：对利什曼病控制和疫苗开发的可能影响。

Parasit Vectors. 2024 Oct 2;17(1):410. doi: 10.1186/s13071-024-06495-z.

Bioarchaeological perspective on the expansion of Transeurasian languages in Neolithic Amur River basin.新石器时代阿穆尔河流域跨欧亚语言扩张的生物考古学视角

Evol Hum Sci. 2020 May 14;2:e15. doi: 10.1017/ehs.2020.16. eCollection 2020.

A Genome-Wide Association Study of 2304 Extreme Longevity Cases Identifies Novel Longevity Variants.一项针对 2304 例极端长寿病例的全基因组关联研究鉴定出了新的长寿变异体。

Int J Mol Sci. 2022 Dec 21;24(1):116. doi: 10.3390/ijms24010116.

Varying Effects of APOE Alleles on Extreme Longevity in European Ethnicities.APOE 等位基因对欧洲裔人群极端长寿的影响各异。

J Gerontol A Biol Sci Med Sci. 2019 Nov 13;74(Suppl_1):S45-S51. doi: 10.1093/gerona/glz179.

本文引用的文献

Assortative Mating by Ethnicity in Longevous Families.长寿家族中按种族进行的选型交配。

Front Genet. 2017 Nov 21;8:186. doi: 10.3389/fgene.2017.00186. eCollection 2017.

Leveraging Multi-ethnic Evidence for Risk Assessment of Quantitative Traits in Minority Populations.利用多民族证据进行少数民族人群数量性状的风险评估。

Am J Hum Genet. 2017 Aug 3;101(2):218-226. doi: 10.1016/j.ajhg.2017.06.015. Epub 2017 Jul 27.

Inclusion of diverse populations in genomic research and health services: Genomix workshop report.将不同人群纳入基因组研究和医疗服务：基因组学研讨会报告

J Community Genet. 2017 Oct;8(4):267-273. doi: 10.1007/s12687-017-0317-5. Epub 2017 Jul 28.

Global variation in gene expression and the value of diverse sampling.基因表达的全球差异及多样采样的价值

Curr Opin Syst Biol. 2017 Feb;1:102-108. doi: 10.1016/j.coisb.2016.12.018. Epub 2017 Mar 3.

Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations.人类人口统计学历史影响不同人群的遗传风险预测。

Am J Hum Genet. 2017 Apr 6;100(4):635-649. doi: 10.1016/j.ajhg.2017.03.004. Epub 2017 Mar 30.

Four Genome-Wide Association Studies Identify New Extreme Longevity Variants.四项全基因组关联研究确定了新的极端长寿变异体。

J Gerontol A Biol Sci Med Sci. 2017 Oct 12;72(11):1453-1464. doi: 10.1093/gerona/glx027.

Limitations and risks of meta-analyses of longevity studies.长寿研究荟萃分析的局限性和风险。

Mech Ageing Dev. 2017 Jul;165(Pt B):139-146. doi: 10.1016/j.mad.2017.01.008. Epub 2017 Jan 28.

Towards Equity in Health: Researchers Take Stock.迈向健康公平：研究人员进行评估。

PLoS Med. 2016 Nov 29;13(11):e1002186. doi: 10.1371/journal.pmed.1002186. eCollection 2016 Nov.

Genomics is failing on diversity.基因组学在多样性方面表现不佳。

Nature. 2016 Oct 13;538(7624):161-164. doi: 10.1038/538161a.

Genetic Misdiagnoses and the Potential for Health Disparities.基因误诊与健康差异的可能性。

N Engl J Med. 2016 Aug 18;375(7):655-65. doi: 10.1056/NEJMsa1507092.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验