Suppr超能文献

快速的空间亲缘关系推断方法——基于灵活的等位基因频率曲面。

Fast spatial ancestry via flexible allele frequency surfaces.

机构信息

Department of Statistics, University of Washington, Seattle, WA 98195, Department of Human Genetics, University of Chicago, Chicago, IL 60637 and Department of Biomathematics, Human Genetics, and Statistics, University of California Los Angeles, Los Angeles, CA 90095, USA.

出版信息

Bioinformatics. 2014 Oct 15;30(20):2915-22. doi: 10.1093/bioinformatics/btu418. Epub 2014 Jul 9.

Abstract

MOTIVATION

Unique modeling and computational challenges arise in locating the geographic origin of individuals based on their genetic backgrounds. Single-nucleotide polymorphisms (SNPs) vary widely in informativeness, allele frequencies change non-linearly with geography and reliable localization requires evidence to be integrated across a multitude of SNPs. These problems become even more acute for individuals of mixed ancestry. It is hardly surprising that matching genetic models to computational constraints has limited the development of methods for estimating geographic origins. We attack these related problems by borrowing ideas from image processing and optimization theory. Our proposed model divides the region of interest into pixels and operates SNP by SNP. We estimate allele frequencies across the landscape by maximizing a product of binomial likelihoods penalized by nearest neighbor interactions. Penalization smooths allele frequency estimates and promotes estimation at pixels with no data. Maximization is accomplished by a minorize-maximize (MM) algorithm. Once allele frequency surfaces are available, one can apply Bayes' rule to compute the posterior probability that each pixel is the pixel of origin of a given person. Placement of admixed individuals on the landscape is more complicated and requires estimation of the fractional contribution of each pixel to a person's genome. This estimation problem also succumbs to a penalized MM algorithm.

RESULTS

We applied the model to the Population Reference Sample (POPRES) data. The model gives better localization for both unmixed and admixed individuals than existing methods despite using just a small fraction of the available SNPs. Computing times are comparable with the best competing software.

AVAILABILITY AND IMPLEMENTATION

Software will be freely available as the OriGen package in R.

CONTACT

ranolaj@uw.edu or klange@ucla.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

根据个体的遗传背景定位其地理来源,会带来独特的建模和计算挑战。单核苷酸多态性(SNP)在信息量方面差异很大,等位基因频率随地理分布呈非线性变化,可靠的定位需要整合大量 SNP 的证据。对于混合血统的个体,这些问题变得更加严重。毫不奇怪,将遗传模型与计算约束相匹配,限制了估计地理起源的方法的发展。我们通过借鉴图像处理和优化理论的思想来解决这些相关问题。我们提出的模型将感兴趣的区域划分为像素,并逐 SNP 进行操作。我们通过最大化二项式似然的乘积来估计整个景观中的等位基因频率,该乘积受到最近邻相互作用的惩罚。惩罚平滑等位基因频率估计值,并促进在没有数据的像素处进行估计。最大化通过最小化最大化(MM)算法来完成。一旦获得等位基因频率曲面,就可以应用贝叶斯法则计算每个像素是给定个体起源像素的后验概率。混合个体在景观上的定位更加复杂,需要估计每个像素对个体基因组的分数贡献。这个估计问题也屈服于惩罚 MM 算法。

结果

我们将该模型应用于人口参考样本(POPRES)数据。尽管只使用了可用 SNP 的一小部分,但该模型在定位未混合和混合个体方面都优于现有的方法。计算时间与最好的竞争软件相当。

可用性和实现

软件将作为 R 中的 OriGen 包免费提供。

联系方式

ranolaj@uw.eduklange@ucla.edu

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

3
ASAFE: ancestry-specific allele frequency estimation.ASAFE:特定血统等位基因频率估计。
Bioinformatics. 2016 Jul 15;32(14):2227-9. doi: 10.1093/bioinformatics/btw220. Epub 2016 May 3.
6
Visualizing the geography of genetic variants.可视化基因变异的分布情况。
Bioinformatics. 2017 Feb 15;33(4):594-595. doi: 10.1093/bioinformatics/btw643.
8
SNP calling by sequencing pooled samples.基于测序的混合样本 SNP 检测。
BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

本文引用的文献

4
Statistical methods in spatial genetics.空间遗传学中的统计方法。
Mol Ecol. 2009 Dec;18(23):4734-56. doi: 10.1111/j.1365-294X.2009.04410.x. Epub 2009 Oct 29.
5
Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计
Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.
7
Genes mirror geography within Europe.基因反映了欧洲内部的地理特征。
Nature. 2008 Nov 6;456(7218):98-101. doi: 10.1038/nature07331. Epub 2008 Aug 31.
8
Correlation between genetic and geographic structure in Europe.欧洲基因结构与地理结构之间的相关性。
Curr Biol. 2008 Aug 26;18(16):1241-8. doi: 10.1016/j.cub.2008.07.049. Epub 2008 Aug 7.
9
Penalized estimation of haplotype frequencies.单倍型频率的惩罚估计
Bioinformatics. 2008 Jul 15;24(14):1596-602. doi: 10.1093/bioinformatics/btn236. Epub 2008 May 16.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验