Suppr超能文献

阵列比较基因组杂交数据的贝叶斯隐马尔可夫模型

Bayesian Hidden Markov Modeling of Array CGH Data.

作者信息

Guha Subharup, Li Yi, Neuberg Donna

机构信息

Department of Statistics, University of Missouri-Columbia, Columbia, MO 65211.

出版信息

J Am Stat Assoc. 2008 Jun 1;103(482):485-497. doi: 10.1198/016214507000000923.

Abstract

Genomic alterations have been linked to the development and progression of cancer. The technique of comparative genomic hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data.We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Because the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme, and breast cancer are analyzed, and comparisons are made with some widely used algorithms to illustrate the reliability and success of the technique.

摘要

基因组改变与癌症的发生和发展有关。比较基因组杂交(CGH)技术产生的数据由测试DNA样本和参考DNA样本的荧光强度比率组成。强度比率提供了有关DNA中拷贝数的信息。诸如组织标本中肿瘤细胞的污染和标准化误差等实际问题使得有必要使用统计学方法从阵列CGH数据中了解基因组改变。随着越来越多的阵列CGH数据可用,对用于表征基因组图谱的自动化算法的需求也在增加。具体而言,需要能够基于统计考虑来识别拷贝数增加和减少的算法,而不仅仅是检测数据中的趋势。我们采用贝叶斯方法,依靠隐马尔可夫模型来考虑强度比率中的内在依赖性。对拷贝数的增加和减少进行后验推断。使用后验概率识别局部扩增(与癌基因突变相关)和缺失(与肿瘤抑制基因突变相关)。检测到诸如拷贝数改变的扩展区域等全局趋势。由于后验分布在分析上难以处理,我们实现了一种Gibbs抽样中的Metropolis算法,用于基于模拟的高效推断。对公开可用的胰腺癌、多形性胶质母细胞瘤和乳腺癌数据进行了分析,并与一些广泛使用的算法进行了比较,以说明该技术的可靠性和成功性。

相似文献

1
Bayesian Hidden Markov Modeling of Array CGH Data.阵列比较基因组杂交数据的贝叶斯隐马尔可夫模型
J Am Stat Assoc. 2008 Jun 1;103(482):485-497. doi: 10.1198/016214507000000923.

引用本文的文献

2
A semiparametric Bayesian model for comparing DNA copy numbers.一种用于比较DNA拷贝数的半参数贝叶斯模型。
Braz J Probab Stat. 2016 Aug;30(3):345-365. doi: 10.1214/15-bjps283. Epub 2016 Jul 29.
3
MAP segmentation in Bayesian hidden Markov models: a case study.贝叶斯隐马尔可夫模型中的地图分割:一个案例研究
J Appl Stat. 2020 Dec 10;49(5):1203-1234. doi: 10.1080/02664763.2020.1858273. eCollection 2022.
10
Time-dependence of graph theory metrics in functional connectivity analysis.功能连接性分析中图形理论指标的时间依赖性。
Neuroimage. 2016 Jan 15;125:601-615. doi: 10.1016/j.neuroimage.2015.10.070. Epub 2015 Oct 27.

本文引用的文献

1
Copy number variation: new insights in genome diversity.拷贝数变异:基因组多样性的新见解
Genome Res. 2006 Aug;16(8):949-61. doi: 10.1101/gr.3677206. Epub 2006 Jun 29.
4
Detection of DNA copy number alterations using penalized least squares regression.使用惩罚最小二乘回归检测DNA拷贝数改变
Bioinformatics. 2005 Oct 15;21(20):3811-7. doi: 10.1093/bioinformatics/bti646. Epub 2005 Aug 30.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验