• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用来自匿名个体混合样本的短的、随机的和部分序列估计进化参数。

Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals.

作者信息

Wu Steven H, Rodrigo Allen G

机构信息

Biodesign Institute, Arizona State University, Tempe, AZ, 85287, USA.

Department of Biology, Duke University, Box 90338, Durham, NC, 27708, USA.

出版信息

BMC Bioinformatics. 2015 Nov 4;16:357. doi: 10.1186/s12859-015-0810-y.

DOI:10.1186/s12859-015-0810-y
PMID:26536860
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4634753/
Abstract

BACKGROUND

Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled or untagged individuals, especially when the reconstruction of full length haplotypes can be unreliable. We propose two novel approaches, least squares estimation (LS) and Approximate Bayesian Computation Markov chain Monte Carlo estimation (ABC-MCMC), to infer evolutionary genetic parameters from a collection of short-read sequences obtained from a mixed sample of anonymous DNA using the frequencies of nucleotides at each site only without reconstructing the full-length alignment nor the phylogeny.

RESULTS

We used simulations to evaluate the performance of these algorithms, and our results demonstrate that LS performs poorly because bootstrap 95% Confidence Intervals (CIs) tend to under- or over-estimate the true values of the parameters. In contrast, ABC-MCMC 95% Highest Posterior Density (HPD) intervals recovered from ABC-MCMC enclosed the true parameter values with a rate approximately equivalent to that obtained using BEAST, a program that implements a Bayesian MCMC estimation of evolutionary parameters using full-length sequences. Because there is a loss of information with the use of sitewise nucleotide frequencies alone, the ABC-MCMC 95% HPDs are larger than those obtained by BEAST.

CONCLUSION

We propose two novel algorithms to estimate evolutionary genetic parameters based on the proportion of each nucleotide. The LS method cannot be recommended as a standalone method for evolutionary parameter estimation. On the other hand, parameters recovered by ABC-MCMC are comparable to those obtained using BEAST, but with larger 95% HPDs. One major advantage of ABC-MCMC is that computational time scales linearly with the number of short-read sequences, and is independent of the number of full-length sequences in the original data. This allows us to perform the analysis on NGS datasets with large numbers of short read fragments. The source code for ABC-MCMC is available at https://github.com/stevenhwu/SF-ABC.

摘要

背景

在过去十年中,下一代测序(NGS)已广泛应用,如今是大多数研究人员的首选测序技术。尽管如此,NGS给希望从未标记或未加标签个体的混合样本中估计进化遗传参数的进化生物学家带来了挑战,尤其是当全长单倍型的重建可能不可靠时。我们提出了两种新方法,即最小二乘法估计(LS)和近似贝叶斯计算马尔可夫链蒙特卡罗估计(ABC-MCMC),以仅使用每个位点核苷酸的频率从匿名DNA混合样本获得的短读序列集合中推断进化遗传参数,而无需重建全长比对或系统发育。

结果

我们使用模拟来评估这些算法的性能,结果表明LS表现不佳,因为自展95%置信区间(CI)往往会低估或高估参数的真实值。相比之下,从ABC-MCMC恢复的95%最高后验密度(HPD)区间包含真实参数值的比率与使用BEAST获得的比率大致相当,BEAST是一个使用全长序列对进化参数进行贝叶斯MCMC估计的程序。由于仅使用位点核苷酸频率会导致信息丢失,ABC-MCMC的95%HPD比BEAST获得的更大。

结论

我们提出了两种基于每个核苷酸比例估计进化遗传参数的新算法。LS方法不能作为进化参数估计的独立方法推荐。另一方面,ABC-MCMC恢复的参数与使用BEAST获得的参数相当,但95%HPD更大。ABC-MCMC的一个主要优点是计算时间与短读序列数量呈线性关系,并且与原始数据中全长序列的数量无关。这使我们能够对具有大量短读片段的NGS数据集进行分析。ABC-MCMC的源代码可在https://github.com/stevenhwu/SF-ABC获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42c1/4634753/450f7086d77d/12859_2015_810_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42c1/4634753/776dba685488/12859_2015_810_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42c1/4634753/74046735e8f9/12859_2015_810_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42c1/4634753/450f7086d77d/12859_2015_810_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42c1/4634753/776dba685488/12859_2015_810_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42c1/4634753/74046735e8f9/12859_2015_810_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42c1/4634753/450f7086d77d/12859_2015_810_Fig4_HTML.jpg

相似文献

1
Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals.利用来自匿名个体混合样本的短的、随机的和部分序列估计进化参数。
BMC Bioinformatics. 2015 Nov 4;16:357. doi: 10.1186/s12859-015-0810-y.
2
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
3
Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7.贝叶斯系统发生学中使用 Tracer 1.7 进行的后验总结
Syst Biol. 2018 Sep 1;67(5):901-904. doi: 10.1093/sysbio/syy032.
4
Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.贝叶斯系统发育学中的自适应马尔可夫链蒙特卡罗方法:在BEAST中分析分区数据的应用
Bioinformatics. 2017 Jun 15;33(12):1798-1805. doi: 10.1093/bioinformatics/btx088.
5
ABCtoolbox: a versatile toolkit for approximate Bayesian computations.ABC 工具包:一个用于近似贝叶斯计算的通用工具包。
BMC Bioinformatics. 2010 Mar 4;11:116. doi: 10.1186/1471-2105-11-116.
6
Inference in high-dimensional parameter space.高维参数空间中的推断。
J Comput Biol. 2015 Nov;22(11):997-1004. doi: 10.1089/cmb.2015.0086. Epub 2015 Jul 15.
7
HIV with contact tracing: a case study in approximate Bayesian computation.HIV 接触者追踪:近似贝叶斯计算的案例研究。
Biostatistics. 2010 Oct;11(4):644-60. doi: 10.1093/biostatistics/kxq022. Epub 2010 May 10.
8
Bayesian parameter inference by Markov chain Monte Carlo with hybrid fitness measures: theory and test in apoptosis signal transduction network.贝叶斯参数推断的马尔可夫链蒙特卡罗法与混合适应度度量:细胞凋亡信号转导网络中的理论与检验。
PLoS One. 2013 Sep 27;8(9):e74178. doi: 10.1371/journal.pone.0074178. eCollection 2013.
9
Variational inference for rare variant detection in deep, heterogeneous next-generation sequencing data.用于深度、异质性下一代测序数据中罕见变异检测的变分推断
BMC Bioinformatics. 2017 Jan 19;18(1):45. doi: 10.1186/s12859-016-1451-5.
10
Exact Bayesian inference for phylogenetic birth-death models.精确贝叶斯推断在系统发生生死模型中的应用。
Bioinformatics. 2018 Nov 1;34(21):3638-3645. doi: 10.1093/bioinformatics/bty337.

本文引用的文献

1
HIV Haplotype Inference Using a Propagating Dirichlet Process Mixture Model.使用传播狄利克雷过程混合模型进行HIV单倍型推断
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):182-91. doi: 10.1109/TCBB.2013.145.
2
Ten years of next-generation sequencing technology.十年的下一代测序技术。
Trends Genet. 2014 Sep;30(9):418-26. doi: 10.1016/j.tig.2014.07.001. Epub 2014 Aug 6.
3
Viral quasispecies inference from 454 pyrosequencing.基于 454 焦磷酸测序的病毒准种推断。
BMC Bioinformatics. 2013 Dec 5;14:355. doi: 10.1186/1471-2105-14-355.
4
Empirical validation of viral quasispecies assembly algorithms: state-of-the-art and challenges.病毒准种组装算法的实证验证:现状与挑战
Sci Rep. 2013 Oct 3;3:2837. doi: 10.1038/srep02837.
5
Next-generation sequencing platforms.下一代测序平台。
Annu Rev Anal Chem (Palo Alto Calif). 2013;6:287-303. doi: 10.1146/annurev-anchem-062012-092628.
6
Benchmarking of viral haplotype reconstruction programmes: an overview of the capacities and limitations of currently available programmes.病毒单倍型重建程序的基准测试:当前可用程序的能力与局限性概述
Brief Bioinform. 2014 May;15(3):431-42. doi: 10.1093/bib/bbs081. Epub 2012 Dec 19.
7
Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data.从下一代测序数据估计病毒遗传多样性面临的挑战与机遇
Front Microbiol. 2012 Sep 11;3:329. doi: 10.3389/fmicb.2012.00329. eCollection 2012.
8
MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space.MrBayes 3.2:在大型模型空间中进行高效的贝叶斯系统发育推断和模型选择。
Syst Biol. 2012 May;61(3):539-42. doi: 10.1093/sysbio/sys029. Epub 2012 Feb 22.
9
QuRe: software for viral quasispecies reconstruction from next-generation sequencing data.QuRe:用于从下一代测序数据中重建病毒准种的软件。
Bioinformatics. 2012 Jan 1;28(1):132-3. doi: 10.1093/bioinformatics/btr627. Epub 2011 Nov 15.
10
Inferring viral quasispecies spectra from 454 pyrosequencing reads.从 454 焦磷酸测序读取中推断病毒准种谱。
BMC Bioinformatics. 2011;12 Suppl 6(Suppl 6):S1. doi: 10.1186/1471-2105-12-S6-S1. Epub 2011 Jul 28.