Suppr超能文献

阵列比较基因组杂交数据的贝叶斯隐马尔可夫模型

Bayesian Hidden Markov Modeling of Array CGH Data.

作者信息

Guha Subharup, Li Yi, Neuberg Donna

机构信息

Department of Statistics, University of Missouri-Columbia, Columbia, MO 65211.

出版信息

J Am Stat Assoc. 2008 Jun 1;103(482):485-497. doi: 10.1198/016214507000000923.

Abstract

Genomic alterations have been linked to the development and progression of cancer. The technique of comparative genomic hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data.We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Because the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme, and breast cancer are analyzed, and comparisons are made with some widely used algorithms to illustrate the reliability and success of the technique.

摘要

基因组改变与癌症的发生和发展有关。比较基因组杂交(CGH)技术产生的数据由测试DNA样本和参考DNA样本的荧光强度比率组成。强度比率提供了有关DNA中拷贝数的信息。诸如组织标本中肿瘤细胞的污染和标准化误差等实际问题使得有必要使用统计学方法从阵列CGH数据中了解基因组改变。随着越来越多的阵列CGH数据可用,对用于表征基因组图谱的自动化算法的需求也在增加。具体而言,需要能够基于统计考虑来识别拷贝数增加和减少的算法,而不仅仅是检测数据中的趋势。我们采用贝叶斯方法,依靠隐马尔可夫模型来考虑强度比率中的内在依赖性。对拷贝数的增加和减少进行后验推断。使用后验概率识别局部扩增(与癌基因突变相关)和缺失(与肿瘤抑制基因突变相关)。检测到诸如拷贝数改变的扩展区域等全局趋势。由于后验分布在分析上难以处理,我们实现了一种Gibbs抽样中的Metropolis算法,用于基于模拟的高效推断。对公开可用的胰腺癌、多形性胶质母细胞瘤和乳腺癌数据进行了分析,并与一些广泛使用的算法进行了比较,以说明该技术的可靠性和成功性。

相似文献

1
Bayesian Hidden Markov Modeling of Array CGH Data.
J Am Stat Assoc. 2008 Jun 1;103(482):485-497. doi: 10.1198/016214507000000923.
2
A latent class model with hidden Markov dependence for array CGH data.
Biometrics. 2009 Dec;65(4):1296-305. doi: 10.1111/j.1541-0420.2009.01226.x.
4
A stepwise framework for the normalization of array CGH data.
BMC Bioinformatics. 2005 Nov 18;6:274. doi: 10.1186/1471-2105-6-274.
5
A Bayesian hidden Markov model for detecting differentially methylated regions.
Biometrics. 2019 Jun;75(2):663-673. doi: 10.1111/biom.13000. Epub 2019 Mar 29.
6
High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization.
Carcinogenesis. 2004 Aug;25(8):1345-57. doi: 10.1093/carcin/bgh134. Epub 2004 Mar 4.
7
Epigenetic change detection and pattern recognition via Bayesian hierarchical hidden Markov models.
Stat Med. 2013 Jun 15;32(13):2292-307. doi: 10.1002/sim.5658. Epub 2012 Oct 25.
8
Optimizing comparative genomic hybridization for analysis of DNA sequence copy number changes in solid tumors.
Genes Chromosomes Cancer. 1994 Aug;10(4):231-43. doi: 10.1002/gcc.2870100403.
9
Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data.
Bioinformatics. 2005 Oct 1;21(19):3763-70. doi: 10.1093/bioinformatics/bti611. Epub 2005 Aug 4.
10
Continuous-index hidden Markov modelling of array CGH copy number data.
Bioinformatics. 2007 Apr 15;23(8):1006-14. doi: 10.1093/bioinformatics/btm059. Epub 2007 Feb 19.

引用本文的文献

2
A semiparametric Bayesian model for comparing DNA copy numbers.
Braz J Probab Stat. 2016 Aug;30(3):345-365. doi: 10.1214/15-bjps283. Epub 2016 Jul 29.
3
MAP segmentation in Bayesian hidden Markov models: a case study.
J Appl Stat. 2020 Dec 10;49(5):1203-1234. doi: 10.1080/02664763.2020.1858273. eCollection 2022.
4
A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data.
Quant Biol. 2018 Sep;6(3):275-286. doi: 10.1007/s40484-018-0149-2. Epub 2018 Aug 30.
5
Statistical Considerations on NGS Data for Inferring Copy Number Variations.
Methods Mol Biol. 2021;2243:27-58. doi: 10.1007/978-1-0716-1103-6_2.
6
Epilepsy as a dynamic disease: A Bayesian model for differentiating seizure risk from natural variability.
Epilepsia Open. 2018 Apr 20;3(2):236-246. doi: 10.1002/epi4.12112. eCollection 2018 Jun.
7
Functional interaction-based nonlinear models with application to multiplatform genomics data.
Stat Med. 2018 Aug 15;37(18):2715-2733. doi: 10.1002/sim.7671. Epub 2018 May 7.
8
Optimization of Signal Decomposition Matched Filtering (SDMF) for Improved Detection of Copy-Number Variations.
IEEE/ACM Trans Comput Biol Bioinform. 2016 May-Jun;13(3):584-91. doi: 10.1109/TCBB.2015.2448077.
9
Sequential model selection-based segmentation to detect DNA copy number variation.
Biometrics. 2016 Sep;72(3):815-26. doi: 10.1111/biom.12478. Epub 2016 Mar 8.
10
Time-dependence of graph theory metrics in functional connectivity analysis.
Neuroimage. 2016 Jan 15;125:601-615. doi: 10.1016/j.neuroimage.2015.10.070. Epub 2015 Oct 27.

本文引用的文献

1
Copy number variation: new insights in genome diversity.
Genome Res. 2006 Aug;16(8):949-61. doi: 10.1101/gr.3677206. Epub 2006 Jun 29.
2
Linear models and empirical bayes methods for assessing differential expression in microarray experiments.
Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.
3
A stepwise framework for the normalization of array CGH data.
BMC Bioinformatics. 2005 Nov 18;6:274. doi: 10.1186/1471-2105-6-274.
4
Detection of DNA copy number alterations using penalized least squares regression.
Bioinformatics. 2005 Oct 15;21(20):3811-7. doi: 10.1093/bioinformatics/bti646. Epub 2005 Aug 30.
5
Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data.
Bioinformatics. 2005 Oct 1;21(19):3763-70. doi: 10.1093/bioinformatics/bti611. Epub 2005 Aug 4.
6
Array comparative genomic hybridization and its applications in cancer.
Nat Genet. 2005 Jun;37 Suppl:S11-7. doi: 10.1038/ng1569.
7
High-resolution genome-wide mapping of genetic alterations in human glial brain tumors.
Cancer Res. 2005 May 15;65(10):4088-96. doi: 10.1158/0008-5472.CAN-04-4229.
8
Denoising array-based comparative genomic hybridization data using wavelets.
Biostatistics. 2005 Apr;6(2):211-26. doi: 10.1093/biostatistics/kxi004.
9
A statistical approach for array CGH data analysis.
BMC Bioinformatics. 2005 Feb 11;6:27. doi: 10.1186/1471-2105-6-27.
10
A method for calling gains and losses in array CGH data.
Biostatistics. 2005 Jan;6(1):45-58. doi: 10.1093/biostatistics/kxh017.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验