Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
Am J Hum Genet. 2022 Sep 1;109(9):1582-1590. doi: 10.1016/j.ajhg.2022.07.008.
For the genomics community, allele frequencies within defined groups (or "strata") are useful across multiple research and clinical contexts. Benefits include allowing researchers to identify populations for replication or "look up" studies, enabling researchers to compare population-specific frequencies to validate findings, and facilitating assessment of variant pathogenicity in clinical contexts. However, there are potential concerns with stratified allele frequencies. These include potential re-identification (determining whether or not an individual participated in a given research study based on allele frequencies and individual-level genetic data), harm from associating stigmatizing variants with specific groups, potential reification of race as a biological rather than a socio-political category, and whether presenting stratified frequencies-and the downstream applications that this presentation enables-is consistent with participants' informed consents. The NHLBI Trans-Omics for Precision Medicine (TOPMed) program considered the scientific and social implications of different approaches for adding stratified frequencies to the TOPMed BRAVO (Browse All Variants Online) variant server. We recommend a novel approach of presenting ancestry-specific allele frequencies using a statistical method based upon local genetic ancestry inference. Notably, this approach does not require grouping individuals by either predominant global ancestry or race/ethnicity and, therefore, mitigates re-identification and other concerns as the mixture distribution of ancestral allele frequencies varies across the genome. Here we describe our considerations and approach, which can assist other genomics research programs facing similar issues of how to define and present stratified frequencies in publicly available variant databases.
对于基因组学社区来说,在定义的群体(或“层”)内的等位基因频率在多个研究和临床背景下都很有用。其好处包括允许研究人员识别用于复制或“查找”研究的人群,使研究人员能够将特定人群的频率与验证结果进行比较,并促进在临床背景下评估变异的致病性。然而,分层等位基因频率存在潜在的问题。这些问题包括潜在的再识别(基于等位基因频率和个体水平的遗传数据来确定个体是否参与了特定的研究),将污名化变体与特定群体联系起来的危害,种族作为生物学而不是社会政治范畴的具体化,以及呈现分层频率——以及这种呈现所带来的下游应用——是否符合参与者的知情同意。美国国立卫生研究院转化精准医学(NHLBI Trans-Omics for Precision Medicine,TOPMed)计划考虑了向 TOPMed BRAVO(Browse All Variants Online,在线浏览所有变体)变体服务器添加分层频率的不同方法的科学和社会影响。我们建议使用基于本地遗传祖先推断的统计方法来呈现特定祖先的等位基因频率的新方法。值得注意的是,这种方法不需要按照主要的全球祖先或种族/民族来对个体进行分组,因此减轻了再识别和其他问题,因为祖先等位基因频率的混合分布在整个基因组中是不同的。在这里,我们描述了我们的考虑和方法,这可以帮助其他基因组学研究计划解决如何在公共可用的变体数据库中定义和呈现分层频率的类似问题。