Suppr超能文献

由于亚结构和隐性关系导致的遗传异常值的识别。

Identification of genetic outliers due to sub-structure and cryptic relationships.

机构信息

Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA 02115, USA.

Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02115, USA.

出版信息

Bioinformatics. 2017 Jul 1;33(13):1972-1979. doi: 10.1093/bioinformatics/btx109.

Abstract

MOTIVATION

In order to minimize the effects of genetic confounding on the analysis of high-throughput genetic association studies, e.g. (whole-genome) sequencing (WGS) studies, genome-wide association studies (GWAS), etc., we propose a general framework to assess and to test formally for genetic heterogeneity among study subjects. As the approach fully utilizes the recent ancestor information captured by rare variants, it is especially powerful in WGS studies. Even for relatively moderate sample sizes, the proposed testing framework is able to identify study subjects that are genetically too similar, e.g. cryptic relationships, or that are genetically too different, e.g. population substructure. The approach is computationally fast, enabling the application to whole-genome sequencing data, and straightforward to implement.

RESULTS

Simulation studies illustrate the overall performance of our approach. In an application to the 1000 Genomes Project, we outline an analysis/cleaning pipeline that utilizes our approach to formally assess whether study subjects are related and whether population substructure is present. In the analysis of the 1000 Genomes Project data, our approach revealed subjects that are most likely related, but had previously passed standard qc-filters.

AVAILABILITY AND IMPLEMENTATION

An implementation of our method, Similarity Test for Estimating Genetic Outliers (STEGO), is available in the R package stego from Github at https://github.com/dschlauch/stego .

CONTACT

dschlauch@fas.harvard.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

为了最大限度地减少遗传混杂对高通量遗传关联研究(例如全基因组测序 [WGS] 研究、全基因组关联研究 [GWAS] 等)分析的影响,我们提出了一种评估和正式检验研究对象之间遗传异质性的通用框架。由于该方法充分利用了稀有变异所捕获的近期祖先信息,因此在 WGS 研究中特别强大。即使对于相对适中的样本量,所提出的检验框架也能够识别遗传上过于相似的研究对象,例如隐匿关系,或者遗传上过于不同的研究对象,例如人口亚结构。该方法计算速度快,能够应用于全基因组测序数据,并且易于实现。

结果

模拟研究说明了我们方法的整体性能。在对 1000 个基因组计划的应用中,我们概述了一种分析/清理管道,该管道利用我们的方法正式评估研究对象是否相关,以及是否存在人口亚结构。在对 1000 个基因组计划数据的分析中,我们的方法揭示了最有可能相关但先前通过标准 QC 过滤器的对象。

可用性和实现

我们方法的实现,即用于估计遗传异常值的相似性检验(STEGO),可在 R 包 stego 中从 Github 获得,网址为 https://github.com/dschlauch/stego

联系方式

dschlauch@fas.harvard.edu

补充信息

补充数据可在生物信息学在线获得。

相似文献

2
A generalized association test based on U statistics.基于 U 统计量的广义关联检验。
Bioinformatics. 2017 Jul 1;33(13):1963-1971. doi: 10.1093/bioinformatics/btx103.
7
svtools: population-scale analysis of structural variation.svtools:结构变异的大规模群体分析。
Bioinformatics. 2019 Nov 1;35(22):4782-4787. doi: 10.1093/bioinformatics/btz492.
8
Genome U-Plot: a whole genome visualization.基因组 U 形图:全基因组可视化。
Bioinformatics. 2018 May 15;34(10):1629-1634. doi: 10.1093/bioinformatics/btx829.
10
Phylotyper: in silico predictor of gene subtypes. phylotyper:基因亚型的计算机预测器。
Bioinformatics. 2017 Nov 15;33(22):3638-3641. doi: 10.1093/bioinformatics/btx459.

引用本文的文献

8
Effect of population stratification on SNP-by-environment interaction.人群分层对 SNP-环境交互作用的影响。
Genet Epidemiol. 2019 Dec;43(8):1046-1055. doi: 10.1002/gepi.22250. Epub 2019 Aug 20.

本文引用的文献

4
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
6
Improved ancestry inference using weights from external reference panels.利用外部参考面板的权重提高祖先推断。
Bioinformatics. 2013 Jun 1;29(11):1399-406. doi: 10.1093/bioinformatics/btt144. Epub 2013 Mar 28.
8
Estimating kinship in admixed populations.估算混合人群中的亲属关系。
Am J Hum Genet. 2012 Jul 13;91(1):122-38. doi: 10.1016/j.ajhg.2012.05.024. Epub 2012 Jun 28.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验