Suppr超能文献

PopCluster:一种识别具有种族依赖性效应的基因变异的算法。

PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects.

作者信息

Gurinovich Anastasia, Bae Harold, Farrell John J, Andersen Stacy L, Monti Stefano, Puca Annibale, Atzmon Gil, Barzilai Nir, Perls Thomas T, Sebastiani Paola

机构信息

Bioinformatics Program, Boston University, Boston, MA, USA.

College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA.

出版信息

Bioinformatics. 2019 Sep 1;35(17):3046-3054. doi: 10.1093/bioinformatics/btz017.

Abstract

MOTIVATION

Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery.

RESULTS

In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype.

AVAILABILITY AND IMPLEMENTATION

PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在过去十年中,全基因组关联研究纳入了更多样化的人群。如果一个基因变异在不同人群中对表型有不同影响,那么将全基因组关联研究应用于整个数据集可能无法找出这些差异。在最终会导致诊断测试开发或药物发现的研究中,能够识别基因变异的人群特异性效应尤为重要。

结果

在本文中,我们提出了PopCluster算法:一种自动发现基因变异的遗传效应在统计学上存在差异的个体子集的算法。PopCluster提供了一个简单的框架,无需事先了解受试者的种族即可直接分析基因型数据。PopCluster结合了逻辑回归建模、主成分分析、层次聚类和递归自底向上的树解析程序。对PopCluster的评估表明,在病例和对照之间等位基因频率差异很大的模拟中,该算法具有稳定的低假阳性率(约4%)和高真阳性率(>80%)。将PopCluster应用于长寿基因研究的数据,发现rs3764814(USP42)与该表型的关联存在种族依赖性异质性。

可用性和实现方式

PopCluster使用R编程语言、PLINK和Eigensoft软件实现,可在以下GitHub存储库中找到:https://github.com/gurinovich/PopCluster ,其中包含其安装和使用说明。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

6
Unified representation of genetic variants.基因变异的统一表示
Bioinformatics. 2015 Jul 1;31(13):2202-4. doi: 10.1093/bioinformatics/btv112. Epub 2015 Feb 19.

引用本文的文献

本文引用的文献

1
Assortative Mating by Ethnicity in Longevous Families.长寿家族中按种族进行的选型交配。
Front Genet. 2017 Nov 21;8:186. doi: 10.3389/fgene.2017.00186. eCollection 2017.
4
Global variation in gene expression and the value of diverse sampling.基因表达的全球差异及多样采样的价值
Curr Opin Syst Biol. 2017 Feb;1:102-108. doi: 10.1016/j.coisb.2016.12.018. Epub 2017 Mar 3.
7
Limitations and risks of meta-analyses of longevity studies.长寿研究荟萃分析的局限性和风险。
Mech Ageing Dev. 2017 Jul;165(Pt B):139-146. doi: 10.1016/j.mad.2017.01.008. Epub 2017 Jan 28.
8
Towards Equity in Health: Researchers Take Stock.迈向健康公平:研究人员进行评估。
PLoS Med. 2016 Nov 29;13(11):e1002186. doi: 10.1371/journal.pmed.1002186. eCollection 2016 Nov.
9
Genomics is failing on diversity.基因组学在多样性方面表现不佳。
Nature. 2016 Oct 13;538(7624):161-164. doi: 10.1038/538161a.
10

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验