Suppr超能文献

一种用于下一代测序中相位和特征分析单倍型的动态贝叶斯马尔可夫模型。

A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing.

机构信息

Department of Statistics, The Pennsylvania State University, 325 Thomas, University Park, PA 16802, USA.

出版信息

Bioinformatics. 2013 Apr 1;29(7):878-85. doi: 10.1093/bioinformatics/btt065. Epub 2013 Feb 13.

Abstract

MOTIVATION

Next-generation sequencing (NGS) technologies have enabled whole-genome discovery and analysis of genetic variants in many species of interest. Individuals are often sequenced at low coverage for detecting novel variants, phasing haplotypes and inferring population structures. Although several tools have been developed for SNP and genotype calling in NGS data, haplotype phasing is often done separately on the called genotypes.

RESULTS

We propose a dynamic Bayesian Markov model (DBM) for simultaneous genotype calling and haplotype phasing in low-coverage NGS data of unrelated individuals. Our method is fully probabilistic that produces consistent inference of genotypes, haplotypes and recombination probabilities. Using data from the 1000 Genomes Project, we demonstrate that DBM not only yields more accurate results than some popular methods, but also provides novel characterization of haplotype structures at the individual level for visualization, interpretation and comparison in downstream analysis. DBM is a powerful and flexible tool that can be applied to many sequencing studies. Its statistical framework can also be extended to accommodate broader scopes of data.

AVAILABILITY AND IMPLEMENTATION

http://stat.psu.edu/∼yuzhang/software/dbm.tar.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

下一代测序(NGS)技术已经能够在许多感兴趣的物种中进行全基因组发现和遗传变异分析。个体通常以低覆盖率进行测序,以检测新的变异、相位单倍型并推断种群结构。尽管已经开发了几种用于 NGS 数据中 SNP 和基因型调用的工具,但单倍型相位通常是在调用的基因型上分别进行的。

结果

我们提出了一种用于在无关个体的低覆盖率 NGS 数据中同时进行基因型调用和单倍型相位的动态贝叶斯马尔可夫模型(DBM)。我们的方法是完全概率的,可对基因型、单倍型和重组概率进行一致的推断。使用来自 1000 基因组计划的数据,我们证明 DBM 不仅比一些流行的方法产生更准确的结果,而且还提供了个体水平单倍型结构的新颖特征化,用于下游分析中的可视化、解释和比较。DBM 是一种强大且灵活的工具,可应用于许多测序研究。它的统计框架也可以扩展到更广泛的数据范围。

可用性和实现

http://stat.psu.edu/∼yuzhang/software/dbm.tar。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

2
De novo inference of stratification and local admixture in sequencing studies.从头推断测序研究中的分层和局部混合。
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S17. doi: 10.1186/1471-2105-14-S5-S17. Epub 2013 Apr 10.
10
Haplotype reconstruction using perfect phylogeny and sequence data.基于完美系统发育和序列数据的单体型重构。
BMC Bioinformatics. 2012 Apr 19;13 Suppl 6(Suppl 6):S3. doi: 10.1186/1471-2105-13-S6-S3.

本文引用的文献

1
Nonparametric Bayes Modeling of Multivariate Categorical Data.多变量分类数据的非参数贝叶斯建模
J Am Stat Assoc. 2012 Jan 1;104(487):1042-1051. doi: 10.1198/jasa.2009.tm08439.
2
Phasing of many thousands of genotyped samples.对数千份基因分型样本进行分相。
Am J Hum Genet. 2012 Aug 10;91(2):238-51. doi: 10.1016/j.ajhg.2012.06.013.
3
Haplotype reconstruction using perfect phylogeny and sequence data.基于完美系统发育和序列数据的单体型重构。
BMC Bioinformatics. 2012 Apr 19;13 Suppl 6(Suppl 6):S3. doi: 10.1186/1471-2105-13-S6-S3.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验