• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

快速马尔可夫链蒙特卡罗抽样法用于确定隐藏马尔可夫模型中的拷贝数变异。

Fast MCMC sampling for hidden Markov Models to determine copy number variations.

机构信息

Department of Computer Science, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854, USA.

出版信息

BMC Bioinformatics. 2011 Nov 2;12:428. doi: 10.1186/1471-2105-12-428.

DOI:10.1186/1471-2105-12-428
PMID:22047014
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3371636/
Abstract

BACKGROUND

Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems.

RESULTS

We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by kd-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling.

CONCLUSIONS

We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches.

AVAILABILITY

An implementation of our method will be made available as part of the open source GHMM library from http://ghmm.org.

摘要

背景

隐马尔可夫模型(HMM)常用于分析比较基因组杂交(CGH)数据,通过对观测序列进行分割来识别染色体异常或拷贝数变异。出于效率原因,HMM 的参数通常通过最大似然法进行估计,并通过维特比算法获得分割。这会在分割中引入相当大的不确定性,可以通过使用马尔可夫链蒙特卡罗(MCMC)采样来对参数进行贝叶斯推断的方法来避免。虽然贝叶斯方法的优势已得到明确证明,但由于其运行时间较短,基于似然的方法在实践中仍然更受欢迎;来自高密度阵列和下一代测序的数据集放大了这些问题。

结果

我们提出了一种近似采样技术,该技术的灵感来自于 HMM 计算中离散序列的压缩和 kd-树,以利用典型数据集中数据点之间的空间关系,从而加快 MCMC 采样速度。

结论

我们在模拟和生物 ArrayCGH 数据集以及高密度 SNP 阵列上测试了我们的近似采样方法,并分别实现了 10 到 60 倍和 90 倍的加速,同时与最先进的贝叶斯方法相比取得了有竞争力的结果。

可用性

我们的方法的实现将作为来自 http://ghmm.org 的开源 GHMM 库的一部分提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/0931e1a915be/1471-2105-12-428-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/497e6d13f13b/1471-2105-12-428-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/15ddd2ec7040/1471-2105-12-428-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/ed407d5bd27a/1471-2105-12-428-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/8491d8e3fb1e/1471-2105-12-428-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/c2f7518d9a55/1471-2105-12-428-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/f000efffdc65/1471-2105-12-428-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/ce943cbd9ffe/1471-2105-12-428-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/0931e1a915be/1471-2105-12-428-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/497e6d13f13b/1471-2105-12-428-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/15ddd2ec7040/1471-2105-12-428-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/ed407d5bd27a/1471-2105-12-428-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/8491d8e3fb1e/1471-2105-12-428-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/c2f7518d9a55/1471-2105-12-428-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/f000efffdc65/1471-2105-12-428-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/ce943cbd9ffe/1471-2105-12-428-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc0/3371636/0931e1a915be/1471-2105-12-428-8.jpg

相似文献

1
Fast MCMC sampling for hidden Markov Models to determine copy number variations.快速马尔可夫链蒙特卡罗抽样法用于确定隐藏马尔可夫模型中的拷贝数变异。
BMC Bioinformatics. 2011 Nov 2;12:428. doi: 10.1186/1471-2105-12-428.
2
Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression.使用具有小波压缩的隐马尔可夫模型进行拷贝数变异的快速贝叶斯推断。
PLoS Comput Biol. 2016 May 13;12(5):e1004871. doi: 10.1371/journal.pcbi.1004871. eCollection 2016 May.
3
Estimating uncertainty in MRF-based image segmentation: A perfect-MCMC approach.基于马尔可夫随机场的图像分割中的不确定性估计:一种完美的马尔可夫链蒙特卡罗方法。
Med Image Anal. 2019 Jul;55:181-196. doi: 10.1016/j.media.2019.04.014. Epub 2019 May 8.
4
Compressed computations using wavelets for hidden Markov models with continuous observations.基于小波的连续观测隐马尔可夫模型的压缩计算。
PLoS One. 2023 Jun 6;18(6):e0286074. doi: 10.1371/journal.pone.0286074. eCollection 2023.
5
Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis.利用具有收敛诊断的并行马尔可夫链蒙特卡罗方法快速预测育种值。
BMC Bioinformatics. 2018 Jan 3;19(1):3. doi: 10.1186/s12859-017-2003-3.
6
Bayesian restoration of a hidden Markov chain with applications to DNA sequencing.应用于DNA测序的隐马尔可夫链的贝叶斯恢复
J Comput Biol. 1999 Summer;6(2):261-77. doi: 10.1089/cmb.1999.6.261.
7
A simple introduction to Markov Chain Monte-Carlo sampling.马尔可夫链蒙特卡罗采样简介。
Psychon Bull Rev. 2018 Feb;25(1):143-154. doi: 10.3758/s13423-016-1015-8.
8
Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model.基于逐步贝叶斯模型,利用阵列强度和测序读取深度检测拷贝数变异。
BMC Bioinformatics. 2010 Oct 31;11:539. doi: 10.1186/1471-2105-11-539.
9
A Bayesian nonparametric approach for uncovering rat hippocampal population codes during spatial navigation.一种用于揭示空间导航过程中大鼠海马体群体编码的贝叶斯非参数方法。
J Neurosci Methods. 2016 Apr 1;263:36-47. doi: 10.1016/j.jneumeth.2016.01.022. Epub 2016 Feb 5.
10
Detecting copy number variations from array CGH data based on a conditional random field model.基于条件随机场模型从阵列比较基因组杂交数据中检测拷贝数变异。
J Bioinform Comput Biol. 2010 Apr;8(2):295-314. doi: 10.1142/s021972001000480x.

引用本文的文献

1
Nuclear and mitochondrial population genetics of the Australasian arbovirus vector Culex annulirostris (Skuse) reveals strong geographic structure and cryptic species.澳大利亚虫媒病毒载体环喙库蚊(Culex annulirostris,斯库斯)的核基因和线粒体群体遗传学研究揭示了其强大的地理结构和隐存种。
Parasit Vectors. 2024 Dec 4;17(1):501. doi: 10.1186/s13071-024-06551-8.
2
Compressed computations using wavelets for hidden Markov models with continuous observations.基于小波的连续观测隐马尔可夫模型的压缩计算。
PLoS One. 2023 Jun 6;18(6):e0286074. doi: 10.1371/journal.pone.0286074. eCollection 2023.
3
Fuzzy methods for the detection of copy number variations in comparative genomic hybridization arrays.

本文引用的文献

1
Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.随机松弛,吉布斯分布,以及贝叶斯图像恢复。
IEEE Trans Pattern Anal Mach Intell. 1984 Jun;6(6):721-41. doi: 10.1109/tpami.1984.4767596.
2
Bayesian Hidden Markov Modeling of Array CGH Data.阵列比较基因组杂交数据的贝叶斯隐马尔可夫模型
J Am Stat Assoc. 2008 Jun 1;103(482):485-497. doi: 10.1198/016214507000000923.
3
VEGA: variational segmentation for copy number detection.VEGA:用于拷贝数检测的变分分割。
用于检测比较基因组杂交阵列中拷贝数变异的模糊方法。
Saudi J Biol Sci. 2020 Dec;27(12):3647-3654. doi: 10.1016/j.sjbs.2020.08.007. Epub 2020 Aug 13.
4
Bayesian localization of CNV candidates in WGS data within minutes.在几分钟内对全基因组测序(WGS)数据中的拷贝数变异(CNV)候选区域进行贝叶斯定位。
Algorithms Mol Biol. 2019 Sep 23;14:20. doi: 10.1186/s13015-019-0154-7. eCollection 2019.
5
Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression.使用具有小波压缩的隐马尔可夫模型进行拷贝数变异的快速贝叶斯推断。
PLoS Comput Biol. 2016 May 13;12(5):e1004871. doi: 10.1371/journal.pcbi.1004871. eCollection 2016 May.
6
Dynamic expression of 3' UTRs revealed by Poisson hidden Markov modeling of RNA-Seq: implications in gene expression profiling.通过 RNA-Seq 的泊松隐马尔可夫模型揭示的 3'UTR 的动态表达:在基因表达谱分析中的意义。
Gene. 2013 Sep 25;527(2):616-23. doi: 10.1016/j.gene.2013.06.052. Epub 2013 Jul 9.
7
Fast detection of de novo copy number variants from SNP arrays for case-parent trios.基于 SNP 芯片的先证者-父母三体型检测新发拷贝数变异的快速方法。
BMC Bioinformatics. 2012 Dec 12;13:330. doi: 10.1186/1471-2105-13-330.
Bioinformatics. 2010 Dec 15;26(24):3020-7. doi: 10.1093/bioinformatics/btq586. Epub 2010 Oct 19.
4
Functional impact of global rare copy number variation in autism spectrum disorders.自闭症谱系障碍中全球罕见拷贝数变异的功能影响。
Nature. 2010 Jul 15;466(7304):368-72. doi: 10.1038/nature09146. Epub 2010 Jun 9.
5
Note on the sampling error of the difference between correlated proportions or percentages.关于相关比例或百分比差异的抽样误差说明。
Psychometrika. 1947 Jun;12(2):153-7. doi: 10.1007/BF02295996.
6
A segmental maximum a posteriori approach to genome-wide copy number profiling.一种用于全基因组拷贝数分析的分段最大后验概率方法。
Bioinformatics. 2008 Mar 15;24(6):751-8. doi: 10.1093/bioinformatics/btn003. Epub 2008 Jan 19.
7
Genome-wide DNA copy number analysis in pancreatic cancer using high-density single nucleotide polymorphism arrays.使用高密度单核苷酸多态性阵列对胰腺癌进行全基因组DNA拷贝数分析。
Oncogene. 2008 Mar 20;27(13):1951-60. doi: 10.1038/sj.onc.1210832. Epub 2007 Oct 22.
8
PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data.PennCNV:一种为在全基因组单核苷酸多态性基因分型数据中进行高分辨率拷贝数变异检测而设计的集成隐马尔可夫模型。
Genome Res. 2007 Nov;17(11):1665-74. doi: 10.1101/gr.6861907. Epub 2007 Oct 5.
9
A segmentation/clustering model for the analysis of array CGH data.一种用于分析阵列比较基因组杂交(array CGH)数据的分割/聚类模型。
Biometrics. 2007 Sep;63(3):758-66. doi: 10.1111/j.1541-0420.2006.00729.x.
10
Modeling recurrent DNA copy number alterations in array CGH data.阵列比较基因组杂交数据中复发性DNA拷贝数改变的建模
Bioinformatics. 2007 Jul 1;23(13):i450-8. doi: 10.1093/bioinformatics/btm221.