Suppr超能文献

贝叶斯分层混合建模,用于从靶向 CNV 阵列中分配拷贝数。

Bayesian hierarchical mixture modeling to assign copy number from a targeted CNV array.

机构信息

Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, United Kingdom.

出版信息

Genet Epidemiol. 2011 Sep;35(6):536-48. doi: 10.1002/gepi.20604. Epub 2011 Jul 18.

Abstract

Accurate assignment of copy number at known copy number variant (CNV) loci is important for both increasing understanding of the structural evolution of genomes as well as for carrying out association studies of copy number with disease. As with calling SNP genotypes, the task can be framed as a clustering problem but for a number of reasons assigning copy number is much more challenging. CNV assays have lower signal-to-noise ratios than SNP assays, often display heavy tailed and asymmetric intensity distributions, contain outlying observations and may exhibit systematic technical differences among different cohorts. In addition, the number of copy-number classes at a CNV in the population may be unknown a priori. Due to these complications, automatic and robust assignment of copy number from array data remains a challenging problem. We have developed a copy number assignment algorithm, CNVCALL, for a targeted CNV array, such as that used by the Wellcome Trust Case Control Consortium's recent CNV association study. We use a Bayesian hierarchical mixture model that robustly identifies both the number of different copy number classes at a specific locus as well as relative copy number for each individual in the sample. This approach is fully automated which is a critical requirement when analyzing large numbers of CNVs. We illustrate the methods performance using real data from the Wellcome Trust Case Control Consortium's CNV association study and using simulated data.

摘要

准确地确定已知拷贝数变异(CNV)位点的拷贝数对于增加对基因组结构演化的理解以及进行拷贝数与疾病的关联研究都很重要。与调用 SNP 基因型一样,该任务可以被构造成聚类问题,但由于多种原因,分配拷贝数更具挑战性。CNV 检测的信号与噪声比低于 SNP 检测,通常表现出重尾和非对称的强度分布,包含异常值观察值,并且可能在不同队列之间表现出系统的技术差异。此外,人群中特定 CNV 的拷贝数类别数量可能事先未知。由于这些复杂性,从阵列数据中自动且稳健地分配拷贝数仍然是一个具有挑战性的问题。我们已经开发了一种拷贝数分配算法,即 CNVCALL,用于靶向 CNV 阵列,例如由惠康信托基金病例对照协会的最近 CNV 关联研究使用的阵列。我们使用贝叶斯分层混合模型,该模型稳健地识别特定位置的不同拷贝数类别数量以及样本中每个个体的相对拷贝数。这种方法是完全自动化的,这是分析大量 CNV 时的关键要求。我们使用来自惠康信托基金病例对照协会的 CNV 关联研究的真实数据和模拟数据来说明方法的性能。

相似文献

9
The effect of algorithms on copy number variant detection.算法对拷贝数变异检测的影响。
PLoS One. 2010 Dec 30;5(12):e14456. doi: 10.1371/journal.pone.0014456.

引用本文的文献

6
Generalized species sampling priors with latent Beta reinforcements.具有潜在贝塔增强的广义物种抽样先验。
J Am Stat Assoc. 2014 Dec 1;109(508):1466-1480. doi: 10.1080/01621459.2014.950735.
7
Modified screening and ranking algorithm for copy number variation detection.用于拷贝数变异检测的改进筛选与排序算法
Bioinformatics. 2015 May 1;31(9):1341-8. doi: 10.1093/bioinformatics/btu850. Epub 2014 Dec 25.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验