阵列比较基因组杂交数据的贝叶斯隐马尔可夫模型

Bayesian Hidden Markov Modeling of Array CGH Data.

作者信息

Guha Subharup, Li Yi, Neuberg Donna

机构信息

Department of Statistics, University of Missouri-Columbia, Columbia, MO 65211.

出版信息

J Am Stat Assoc. 2008 Jun 1;103(482):485-497. doi: 10.1198/016214507000000923.

DOI:10.1198/016214507000000923

PMID:22375091

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3286622/

Abstract

Genomic alterations have been linked to the development and progression of cancer. The technique of comparative genomic hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data.We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Because the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme, and breast cancer are analyzed, and comparisons are made with some widely used algorithms to illustrate the reliability and success of the technique.

摘要

基因组改变与癌症的发生和发展有关。比较基因组杂交（CGH）技术产生的数据由测试DNA样本和参考DNA样本的荧光强度比率组成。强度比率提供了有关DNA中拷贝数的信息。诸如组织标本中肿瘤细胞的污染和标准化误差等实际问题使得有必要使用统计学方法从阵列CGH数据中了解基因组改变。随着越来越多的阵列CGH数据可用，对用于表征基因组图谱的自动化算法的需求也在增加。具体而言，需要能够基于统计考虑来识别拷贝数增加和减少的算法，而不仅仅是检测数据中的趋势。我们采用贝叶斯方法，依靠隐马尔可夫模型来考虑强度比率中的内在依赖性。对拷贝数的增加和减少进行后验推断。使用后验概率识别局部扩增（与癌基因突变相关）和缺失（与肿瘤抑制基因突变相关）。检测到诸如拷贝数改变的扩展区域等全局趋势。由于后验分布在分析上难以处理，我们实现了一种Gibbs抽样中的Metropolis算法，用于基于模拟的高效推断。对公开可用的胰腺癌、多形性胶质母细胞瘤和乳腺癌数据进行了分析，并与一些广泛使用的算法进行了比较，以说明该技术的可靠性和成功性。

相似文献

Bayesian Hidden Markov Modeling of Array CGH Data.阵列比较基因组杂交数据的贝叶斯隐马尔可夫模型

J Am Stat Assoc. 2008 Jun 1;103(482):485-497. doi: 10.1198/016214507000000923.

A latent class model with hidden Markov dependence for array CGH data.一种用于阵列比较基因组杂交（array CGH）数据的具有隐藏马尔可夫依赖性的潜在类别模型。

Biometrics. 2009 Dec;65(4):1296-305. doi: 10.1111/j.1541-0420.2009.01226.x.

The use of ultra-dense array CGH analysis for the discovery of micro-copy number alterations and gene fusions in the cancer genome.超高密度阵列 CGH 分析在癌症基因组中发现微小拷贝数改变和基因融合。

BMC Med Genomics. 2011 Jan 27;4:16. doi: 10.1186/1755-8794-4-16.

A stepwise framework for the normalization of array CGH data.用于阵列比较基因组杂交（array CGH）数据标准化的逐步框架。

BMC Bioinformatics. 2005 Nov 18;6:274. doi: 10.1186/1471-2105-6-274.

A Bayesian hidden Markov model for detecting differentially methylated regions.一种用于检测差异甲基化区域的贝叶斯隐马尔可夫模型。

Biometrics. 2019 Jun;75(2):663-673. doi: 10.1111/biom.13000. Epub 2019 Mar 29.

High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization.基于阵列比较基因组杂交技术的结直肠癌DNA拷贝数改变的高分辨率分析

Carcinogenesis. 2004 Aug;25(8):1345-57. doi: 10.1093/carcin/bgh134. Epub 2004 Mar 4.

Epigenetic change detection and pattern recognition via Bayesian hierarchical hidden Markov models.通过贝叶斯层次隐马尔可夫模型进行表观遗传变化检测和模式识别。

Stat Med. 2013 Jun 15;32(13):2292-307. doi: 10.1002/sim.5658. Epub 2012 Oct 25.

Optimizing comparative genomic hybridization for analysis of DNA sequence copy number changes in solid tumors.优化比较基因组杂交技术用于实体瘤DNA序列拷贝数变化分析

Genes Chromosomes Cancer. 1994 Aug;10(4):231-43. doi: 10.1002/gcc.2870100403.

Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data.用于识别阵列比较基因组杂交（array CGH）数据中扩增和缺失的算法的比较分析。

Bioinformatics. 2005 Oct 1;21(19):3763-70. doi: 10.1093/bioinformatics/bti611. Epub 2005 Aug 4.

Continuous-index hidden Markov modelling of array CGH copy number data.阵列比较基因组杂交拷贝数数据的连续索引隐马尔可夫模型

Bioinformatics. 2007 Apr 15;23(8):1006-14. doi: 10.1093/bioinformatics/btm059. Epub 2007 Feb 19.

引用本文的文献

Construction and Characterization of a High-Capacity Replication-Competent Murine Cytomegalovirus Vector for Gene Delivery.用于基因递送的高容量复制能力小鼠巨细胞病毒载体的构建与表征

Vaccines (Basel). 2024 Jul 18;12(7):791. doi: 10.3390/vaccines12070791.

A semiparametric Bayesian model for comparing DNA copy numbers.一种用于比较DNA拷贝数的半参数贝叶斯模型。

Braz J Probab Stat. 2016 Aug;30(3):345-365. doi: 10.1214/15-bjps283. Epub 2016 Jul 29.

MAP segmentation in Bayesian hidden Markov models: a case study.贝叶斯隐马尔可夫模型中的地图分割：一个案例研究

J Appl Stat. 2020 Dec 10;49(5):1203-1234. doi: 10.1080/02664763.2020.1858273. eCollection 2022.

A Bayesian hierarchical model for analyzing methylated RNA immunoprecipitation sequencing data.一种用于分析甲基化RNA免疫沉淀测序数据的贝叶斯分层模型。

Quant Biol. 2018 Sep;6(3):275-286. doi: 10.1007/s40484-018-0149-2. Epub 2018 Aug 30.

Statistical Considerations on NGS Data for Inferring Copy Number Variations.关于推断拷贝数变异的 NGS 数据的统计考虑。

Methods Mol Biol. 2021;2243:27-58. doi: 10.1007/978-1-0716-1103-6_2.

Epilepsy as a dynamic disease: A Bayesian model for differentiating seizure risk from natural variability.癫痫作为一种动态疾病：一种用于区分发作风险与自然变异性的贝叶斯模型。

Epilepsia Open. 2018 Apr 20;3(2):236-246. doi: 10.1002/epi4.12112. eCollection 2018 Jun.

Functional interaction-based nonlinear models with application to multiplatform genomics data.基于功能交互的非线性模型及其在多平台基因组学数据中的应用。

Stat Med. 2018 Aug 15;37(18):2715-2733. doi: 10.1002/sim.7671. Epub 2018 May 7.

Optimization of Signal Decomposition Matched Filtering (SDMF) for Improved Detection of Copy-Number Variations.用于改进拷贝数变异检测的信号分解匹配滤波（SDMF）优化

IEEE/ACM Trans Comput Biol Bioinform. 2016 May-Jun;13(3):584-91. doi: 10.1109/TCBB.2015.2448077.

Sequential model selection-based segmentation to detect DNA copy number variation.基于序列模型选择的分割方法用于检测DNA拷贝数变异。

Biometrics. 2016 Sep;72(3):815-26. doi: 10.1111/biom.12478. Epub 2016 Mar 8.

Time-dependence of graph theory metrics in functional connectivity analysis.功能连接性分析中图形理论指标的时间依赖性。

Neuroimage. 2016 Jan 15;125:601-615. doi: 10.1016/j.neuroimage.2015.10.070. Epub 2015 Oct 27.

本文引用的文献

Copy number variation: new insights in genome diversity.拷贝数变异：基因组多样性的新见解

Genome Res. 2006 Aug;16(8):949-61. doi: 10.1101/gr.3677206. Epub 2006 Jun 29.

Linear models and empirical bayes methods for assessing differential expression in microarray experiments.用于评估微阵列实验中差异表达的线性模型和经验贝叶斯方法。

Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.

A stepwise framework for the normalization of array CGH data.用于阵列比较基因组杂交（array CGH）数据标准化的逐步框架。

BMC Bioinformatics. 2005 Nov 18;6:274. doi: 10.1186/1471-2105-6-274.

Detection of DNA copy number alterations using penalized least squares regression.使用惩罚最小二乘回归检测DNA拷贝数改变

Bioinformatics. 2005 Oct 15;21(20):3811-7. doi: 10.1093/bioinformatics/bti646. Epub 2005 Aug 30.

Bioinformatics. 2005 Oct 1;21(19):3763-70. doi: 10.1093/bioinformatics/bti611. Epub 2005 Aug 4.

Array comparative genomic hybridization and its applications in cancer.阵列比较基因组杂交及其在癌症中的应用。

Nat Genet. 2005 Jun;37 Suppl:S11-7. doi: 10.1038/ng1569.

High-resolution genome-wide mapping of genetic alterations in human glial brain tumors.人类胶质脑肿瘤基因改变的高分辨率全基因组图谱

Cancer Res. 2005 May 15;65(10):4088-96. doi: 10.1158/0008-5472.CAN-04-4229.

Denoising array-based comparative genomic hybridization data using wavelets.使用小波去噪基于阵列的比较基因组杂交数据。

Biostatistics. 2005 Apr;6(2):211-26. doi: 10.1093/biostatistics/kxi004.

A statistical approach for array CGH data analysis.一种用于阵列比较基因组杂交数据分析的统计方法。

BMC Bioinformatics. 2005 Feb 11;6:27. doi: 10.1186/1471-2105-6-27.

A method for calling gains and losses in array CGH data.一种用于调用阵列比较基因组杂交（array CGH）数据中增减情况的方法。

Biostatistics. 2005 Jan;6(1):45-58. doi: 10.1093/biostatistics/kxh017.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验