利用稀疏测序数据估计结构化和混合群体中的亲缘系数

Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data.

作者信息

Dou Jinzhuang, Sun Baoluo, Sim Xueling, Hughes Jason D, Reilly Dermot F, Tai E Shyong, Liu Jianjun, Wang Chaolong

机构信息

Computational and Systems Biology, Genome Institute of Singapore, Singapore, Singapore.

Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore.

出版信息

PLoS Genet. 2017 Sep 29;13(9):e1007021. doi: 10.1371/journal.pgen.1007021. eCollection 2017 Sep.

DOI:10.1371/journal.pgen.1007021

PMID:28961250

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5636172/

Abstract

Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is challenging for sparse sequencing data, such as those from off-target regions in target sequencing studies, where genotypes are largely uncertain or missing. Existing methods often assume accurate genotypes at a large number of markers across the genome. We show that these methods, without accounting for the genotype uncertainty in sparse sequencing data, can yield a strong downward bias in kinship estimation. We develop a computationally efficient method called SEEKIN to estimate kinship for both homogeneous samples and heterogeneous samples with population structure and admixture. Our method models genotype uncertainty and leverages linkage disequilibrium through imputation. We test SEEKIN on a whole exome sequencing dataset (WES) of Singapore Chinese and Malays, which involves substantial population structure and admixture. We show that SEEKIN can accurately estimate kinship coefficient and classify genetic relatedness using off-target sequencing data down sampled to ~~0.15X depth. In application to the full WES dataset without down sampling, SEEKIN also outperforms existing methods by properly analyzing shallow off-target data (~~0.75X). Using both simulated and real phenotypes, we further illustrate how our method improves estimation of trait heritability for WES studies.

摘要

样本间生物学亲缘关系的知识对许多基因研究都很重要。在大规模人类基因关联研究中，估计的亲缘关系用于消除潜在的相关性、控制家庭结构并估计性状遗传力。然而，对于稀疏测序数据（如目标测序研究中来自脱靶区域的数据，其基因型大多不确定或缺失），亲缘关系的估计具有挑战性。现有方法通常假定全基因组大量标记处的基因型准确无误。我们表明，这些方法在不考虑稀疏测序数据中基因型不确定性的情况下，会在亲缘关系估计中产生强烈的向下偏差。我们开发了一种计算效率高的方法，称为SEEKIN，用于估计具有群体结构和混合的同质样本和异质样本的亲缘关系。我们的方法对基因型不确定性进行建模，并通过插补利用连锁不平衡。我们在新加坡华人和马来人的全外显子测序数据集（WES）上测试了SEEKIN，该数据集涉及大量的群体结构和混合。我们表明，SEEKIN可以准确估计亲缘系数，并使用下采样至约0.15X深度的脱靶测序数据对遗传相关性进行分类。在应用于未下采样的完整WES数据集时，SEEKIN通过正确分析浅层脱靶数据（约0.75X）也优于现有方法。使用模拟和真实表型，我们进一步说明了我们的方法如何改进WES研究中性状遗传力的估计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b22e/5636172/39770b3183d0/pgen.1007021.g001.jpg

相似文献

Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data.

PLoS Genet. 2017 Sep 29;13(9):e1007021. doi: 10.1371/journal.pgen.1007021. eCollection 2017 Sep.

Accurate local-ancestry inference in exome-sequenced admixed individuals via off-target sequence reads.

Am J Hum Genet. 2013 Nov 7;93(5):891-9. doi: 10.1016/j.ajhg.2013.10.008.

Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.

BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.

Privacy-aware estimation of relatedness in admixed populations.

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac473.

An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data.

BMC Genomics. 2019 Dec 30;20(Suppl 12):1007. doi: 10.1186/s12864-019-6333-6.

Using off-target data from whole-exome sequencing to improve genotyping accuracy, association analysis and polygenic risk prediction.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa084.

Moment estimators of relatedness from low-depth whole-genome sequencing data.

BMC Bioinformatics. 2022 Jun 24;23(1):254. doi: 10.1186/s12859-022-04795-8.

NGSremix: a software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data.

G3 (Bethesda). 2021 Aug 7;11(8). doi: 10.1093/g3journal/jkab174.

Estimating FST and kinship for arbitrary population structures.

PLoS Genet. 2021 Jan 19;17(1):e1009241. doi: 10.1371/journal.pgen.1009241. eCollection 2021 Jan.

Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation.

Am J Hum Genet. 2015 Jun 4;96(6):926-37. doi: 10.1016/j.ajhg.2015.04.018. Epub 2015 May 28.

引用本文的文献

Private detection of relatives in forensic genomics using homomorphic encryption.

BMC Med Genomics. 2024 Nov 19;17(1):273. doi: 10.1186/s12920-024-02037-9.

A brief guide to analyzing expression quantitative trait loci.

Mol Cells. 2024 Nov;47(11):100139. doi: 10.1016/j.mocell.2024.100139. Epub 2024 Oct 22.

MethylGenotyper: Accurate Estimation of SNP Genotypes and Genetic Relatedness from DNA Methylation Data.

Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3). doi: 10.1093/gpbjnl/qzae044.

Continental-scale associations of Arabidopsis thaliana phyllosphere members with host genotype and drought.

Nat Microbiol. 2024 Oct;9(10):2748-2758. doi: 10.1038/s41564-024-01773-z. Epub 2024 Sep 6.

Taking identity-by-descent analysis into the wild: Estimating realized relatedness in free-ranging macaques.

bioRxiv. 2024 Jan 11:2024.01.09.574911. doi: 10.1101/2024.01.09.574911.

Genome-Wide Association Study of Gallstone Disease Identifies Novel Candidate Genomic Variants in a Latino Community of Southwest USA.

J Racial Ethn Health Disparities. 2025 Feb;12(1):234-240. doi: 10.1007/s40615-023-01867-0. Epub 2023 Nov 28.

An unbiased kinship estimation method for genetic data analysis.

BMC Bioinformatics. 2022 Dec 6;23(1):525. doi: 10.1186/s12859-022-05082-2.

Privacy-aware estimation of relatedness in admixed populations.

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac473.

A comparative analysis of genomic and phenomic predictions of growth-related traits in 3-way coffee hybrids.

G3 (Bethesda). 2022 Aug 25;12(9). doi: 10.1093/g3journal/jkac170.

Moment estimators of relatedness from low-depth whole-genome sequencing data.

BMC Bioinformatics. 2022 Jun 24;23(1):254. doi: 10.1186/s12859-022-04795-8.

本文引用的文献

Evolution. 1989 Mar;43(2):258-275. doi: 10.1111/j.1558-5646.1989.tb04226.x.

Determination of genetic relatedness from low-coverage human genome sequences using pedigree simulations.

Mol Ecol. 2017 Aug;26(16):4145-4157. doi: 10.1111/mec.14188. Epub 2017 Jul 7.

Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets.

Proc Natl Acad Sci U S A. 2017 May 30;114(22):5671-5676. doi: 10.1073/pnas.1619944114. Epub 2017 May 15.

Targeted sequencing identifies 91 neurodevelopmental-disorder risk genes with autism and developmental-disability biases.

Nat Genet. 2017 Apr;49(4):515-526. doi: 10.1038/ng.3792. Epub 2017 Feb 13.

A reference panel of 64,976 haplotypes for genotype imputation.

Nat Genet. 2016 Oct;48(10):1279-83. doi: 10.1038/ng.3643. Epub 2016 Aug 22.

Analysis of protein-coding genetic variation in 60,706 humans.

Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.

Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples.

Genetics. 2016 Jun;203(2):699-714. doi: 10.1534/genetics.116.187492. Epub 2016 Apr 20.

Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models.

Am J Hum Genet. 2016 Apr 7;98(4):653-66. doi: 10.1016/j.ajhg.2016.02.012. Epub 2016 Mar 24.

A multiple-phenotype imputation method for genetic studies.

Nat Genet. 2016 Apr;48(4):466-72. doi: 10.1038/ng.3513. Epub 2016 Feb 22.

Model-free Estimation of Recent Genetic Relatedness.

Am J Hum Genet. 2016 Jan 7;98(1):127-48. doi: 10.1016/j.ajhg.2015.11.022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用稀疏测序数据估计结构化和混合群体中的亲缘系数

Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data.

作者信息

Dou Jinzhuang, Sun Baoluo, Sim Xueling, Hughes Jason D, Reilly Dermot F, Tai E Shyong, Liu Jianjun, Wang Chaolong

机构信息

Computational and Systems Biology, Genome Institute of Singapore, Singapore, Singapore.

Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore.

出版信息

PLoS Genet. 2017 Sep 29;13(9):e1007021. doi: 10.1371/journal.pgen.1007021. eCollection 2017 Sep.

DOI:10.1371/journal.pgen.1007021

PMID:28961250

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5636172/

Abstract

摘要

利用稀疏测序数据估计结构化和混合群体中的亲缘系数

Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用稀疏测序数据估计结构化和混合群体中的亲缘系数

Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献