Suppr超能文献

GPHMM:一种集成的隐马尔可夫模型,用于使用全基因组 SNP 阵列识别复杂肿瘤样本中的拷贝数改变和杂合性丢失。

GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays.

机构信息

Department of Electronic Science and Technology, University of Science and Technology of China.

出版信息

Nucleic Acids Res. 2011 Jul;39(12):4928-41. doi: 10.1093/nar/gkr014. Epub 2011 Mar 11.

Abstract

There is an increasing interest in using single nucleotide polymorphism (SNP) genotyping arrays for profiling chromosomal rearrangements in tumors, as they allow simultaneous detection of copy number and loss of heterozygosity with high resolution. Critical issues such as signal baseline shift due to aneuploidy, normal cell contamination, and the presence of GC content bias have been reported to dramatically alter SNP array signals and complicate accurate identification of aberrations in cancer genomes. To address these issues, we propose a novel Global Parameter Hidden Markov Model (GPHMM) to unravel tangled genotyping data generated from tumor samples. In contrast to other HMM methods, a distinct feature of GPHMM is that the issues mentioned above are quantitatively modeled by global parameters and integrated within the statistical framework. We developed an efficient EM algorithm for parameter estimation. We evaluated performance on three data sets and show that GPHMM can correctly identify chromosomal aberrations in tumor samples containing as few as 10% cancer cells. Furthermore, we demonstrated that the estimation of global parameters in GPHMM provides information about the biological characteristics of tumor samples and the quality of genotyping signal from SNP array experiments, which is helpful for data quality control and outlier detection in cohort studies.

摘要

人们越来越感兴趣地使用单核苷酸多态性(SNP)基因分型阵列来分析肿瘤中的染色体重排,因为它们可以高分辨率地同时检测拷贝数和杂合性丢失。已经报道了一些关键问题,例如由于非整倍性引起的信号基线偏移、正常细胞污染以及 GC 含量偏倚的存在,这些问题会极大地改变 SNP 阵列信号,并使癌症基因组中异常的准确识别变得复杂。为了解决这些问题,我们提出了一种新颖的全局参数隐马尔可夫模型(GPHMM)来解开来自肿瘤样本的纠缠基因分型数据。与其他 HMM 方法相比,GPHMM 的一个显著特点是,上述问题通过全局参数进行定量建模,并集成到统计框架中。我们开发了一种用于参数估计的有效 EM 算法。我们在三个数据集上进行了性能评估,并表明 GPHMM 可以正确识别包含低至 10%癌细胞的肿瘤样本中的染色体异常。此外,我们证明了 GPHMM 中的全局参数估计提供了有关肿瘤样本生物学特征和 SNP 阵列实验基因分型信号质量的信息,这有助于在队列研究中进行数据质量控制和异常值检测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5022/3130254/1f451cb3dd8e/gkr014f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验