Love Michael I, Myšičková Alena, Sun Ruping, Kalscheuer Vera, Vingron Martin, Haas Stefan A
Max Planck Institute for Molecular Genetics.
Stat Appl Genet Mol Biol. 2011 Nov 8;10(1):/j/sagmb.2011.10.issue-1/1544-6115.1732/1544-6115.1732.xml. doi: 10.2202/1544-6115.1732.
Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.
沿染色体高通量测序读数深度的变化使得在样本中相对于参考观察拷贝数变异(CNV)成为可能。在外显子组和其他靶向测序项目中,技术因素增加了读数深度的变异性,同时减少了观察到的位置数量,给识别CNV的问题增加了难度。我们提出了一种隐马尔可夫模型,用于从原始读数计数数据中检测CNV,使用来自对照组的背景读数深度以及其他位置协变量,如GC含量。该模型exomeCopy应用于一个大型X染色体外显子组测序项目,识别出一系列大型独特的CNV。然后使用来自公开可用外显子组测序数据的跨平台对照组,回收由该模型预测并经实验验证的CNV。模拟结果表明,该模型在检测杂合和纯合CNV方面具有高灵敏度,优于归一化方法和最先进的分割方法。