Suppr超能文献

泊松层级建模方法在序列覆盖数据中检测拷贝数变异。

A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

机构信息

London School of Hygiene and Tropical Medicine, London, UK.

出版信息

BMC Genomics. 2013 Feb 26;14:128. doi: 10.1186/1471-2164-14-128.

Abstract

BACKGROUND

The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.

RESULTS

Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.

CONCLUSIONS

In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data.

摘要

背景

下一代测序技术的出现加速了对重要微生物基因组中拷贝数变异 (CNV) 的绘制和编目工作,用于公共卫生。序列数据的典型分析包括将读取映射到参考基因组上,计算各自的覆盖范围,并检测覆盖范围过低或过高的区域(分别为缺失和扩增)。当前的 CNV 检测方法依赖于统计假设(例如泊松模型),这些假设在一般情况下可能不成立,或者需要对底层算法进行微调以检测已知的命中。我们提出了一种基于两个泊松层次模型(泊松-伽马和泊松-对数正态)的新的 CNV 检测方法,其优点是足够灵活,可以描述不同的数据模式,同时对偏离通常假设的泊松模型具有稳健性。

结果

使用 7 个恶性疟原虫疟原虫基因组(3D7 参考株、HB3、DD2、7G8、GB4、OX005 和 OX006)的序列覆盖数据,我们表明经验覆盖分布本质上是不对称的,与泊松模型相比存在过度离散。我们还使用 3D7 重测序数据和模拟演示了所提出方法的低基线假阳性率。当应用于非参考分离物数据时,我们的方法检测到已知的 CNV 命中,包括 DD2 中 PfMDR1 基因座的扩增和 GB4 中 CLAG3.2 基因的大片段缺失,以及推定的新 CNV 区域。与最近可用的 FREEC 和 cn.MOPS 方法相比,我们的发现与 7G8 和 GB4 分离物的最高质量阵列数据的推定命中更一致。

结论

总之,所提出的方法为使用序列覆盖数据进行 CNV 检测带来了更高的灵活性、稳健性、准确性和统计严谨性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/556e/3679970/5bf6d30d67d7/1471-2164-14-128-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验