泊松层级建模方法在序列覆盖数据中检测拷贝数变异。

A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

机构信息

London School of Hygiene and Tropical Medicine, London, UK.

出版信息

BMC Genomics. 2013 Feb 26;14:128. doi: 10.1186/1471-2164-14-128.

DOI:10.1186/1471-2164-14-128

PMID:23442253

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3679970/

Abstract

BACKGROUND

The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.

RESULTS

Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.

CONCLUSIONS

In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data.

摘要

背景

下一代测序技术的出现加速了对重要微生物基因组中拷贝数变异 (CNV) 的绘制和编目工作，用于公共卫生。序列数据的典型分析包括将读取映射到参考基因组上，计算各自的覆盖范围，并检测覆盖范围过低或过高的区域（分别为缺失和扩增）。当前的 CNV 检测方法依赖于统计假设（例如泊松模型），这些假设在一般情况下可能不成立，或者需要对底层算法进行微调以检测已知的命中。我们提出了一种基于两个泊松层次模型（泊松-伽马和泊松-对数正态）的新的 CNV 检测方法，其优点是足够灵活，可以描述不同的数据模式，同时对偏离通常假设的泊松模型具有稳健性。

结果

使用 7 个恶性疟原虫疟原虫基因组（3D7 参考株、HB3、DD2、7G8、GB4、OX005 和 OX006）的序列覆盖数据，我们表明经验覆盖分布本质上是不对称的，与泊松模型相比存在过度离散。我们还使用 3D7 重测序数据和模拟演示了所提出方法的低基线假阳性率。当应用于非参考分离物数据时，我们的方法检测到已知的 CNV 命中，包括 DD2 中 PfMDR1 基因座的扩增和 GB4 中 CLAG3.2 基因的大片段缺失，以及推定的新 CNV 区域。与最近可用的 FREEC 和 cn.MOPS 方法相比，我们的发现与 7G8 和 GB4 分离物的最高质量阵列数据的推定命中更一致。

结论

总之，所提出的方法为使用序列覆盖数据进行 CNV 检测带来了更高的灵活性、稳健性、准确性和统计严谨性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/556e/3679970/5bf6d30d67d7/1471-2164-14-128-1.jpg

相似文献

A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

BMC Genomics. 2013 Feb 26;14:128. doi: 10.1186/1471-2164-14-128.

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

Nucleic Acids Res. 2012 May;40(9):e69. doi: 10.1093/nar/gks003. Epub 2012 Feb 1.

Noise cancellation using total variation for copy number variation detection.

BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):361. doi: 10.1186/s12859-018-2332-x.

Rapid whole genome optical mapping of Plasmodium falciparum.

Malar J. 2011 Aug 26;10:252. doi: 10.1186/1475-2875-10-252.

De novo detection of copy number variation by co-assembly.

Bioinformatics. 2012 Dec 15;28(24):3195-202. doi: 10.1093/bioinformatics/bts601. Epub 2012 Oct 9.

Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes.

BMC Bioinformatics. 2020 Apr 16;21(1):147. doi: 10.1186/s12859-020-3480-3.

MGP-HMM: Detecting genome-wide CNVs using an HMM for modeling mate pair insertion sizes and read counts.

Math Biosci. 2016 Sep;279:53-62. doi: 10.1016/j.mbs.2016.07.006. Epub 2016 Jul 16.

Global analysis of Plasmodium falciparum histidine-rich protein-2 (pfhrp2) and pfhrp3 gene deletions using whole-genome sequencing data and meta-analysis.

Infect Genet Evol. 2018 Aug;62:211-219. doi: 10.1016/j.meegid.2018.04.039. Epub 2018 May 2.

MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples.

Sci Rep. 2018 Mar 5;8(1):4009. doi: 10.1038/s41598-018-22323-8.

CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data.

PLoS One. 2015 Aug 20;10(8):e0135895. doi: 10.1371/journal.pone.0135895. eCollection 2015.

引用本文的文献

Direct long-read visualization reveals hidden variation in GCH1 gene copy number and precise expansion steps.

BMC Genomics. 2025 Jul 17;26(1):671. doi: 10.1186/s12864-025-11859-5.

A computational framework for improving genetic variants identification from 5,061 sheep sequencing data.

J Anim Sci Biotechnol. 2023 Oct 2;14(1):127. doi: 10.1186/s40104-023-00923-3.

Direct long read visualization reveals metabolic interplay between two antimalarial drug targets.

bioRxiv. 2023 Dec 19:2023.02.13.528367. doi: 10.1101/2023.02.13.528367.

An analysis of large structural variation in global Plasmodium falciparum isolates identifies a novel duplication of the chloroquine resistance associated gene.

Sci Rep. 2019 Jun 4;9(1):8287. doi: 10.1038/s41598-019-44599-0.

Plasmodium falciparum parasites with histidine-rich protein 2 (pfhrp2) and pfhrp3 gene deletions in two endemic regions of Kenya.

Sci Rep. 2017 Nov 7;7(1):14718. doi: 10.1038/s41598-017-15031-2.

Characterizing the impact of sustained sulfadoxine/pyrimethamine use upon the Plasmodium falciparum population in Malawi.

Malar J. 2016 Nov 29;15(1):575. doi: 10.1186/s12936-016-1634-6.

Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum.

Genome Res. 2016 Sep;26(9):1288-99. doi: 10.1101/gr.203711.115. Epub 2016 Aug 16.

Population Structure Shapes Copy Number Variation in Malaria Parasites.

Mol Biol Evol. 2016 Mar;33(3):603-20. doi: 10.1093/molbev/msv282. Epub 2015 Nov 26.

A barcode of organellar genome polymorphisms identifies the geographic origin of Plasmodium falciparum strains.

Nat Commun. 2014 Jun 13;5:4052. doi: 10.1038/ncomms5052.

Single-cell genomics for dissection of complex malaria infections.

Genome Res. 2014 Jun;24(6):1028-38. doi: 10.1101/gr.168286.113. Epub 2014 May 8.

本文引用的文献

Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort.

BMC Genomics. 2012 Jun 15;13:241. doi: 10.1186/1471-2164-13-241.

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

Nucleic Acids Res. 2012 May;40(9):e69. doi: 10.1093/nar/gks003. Epub 2012 Feb 1.

The landscape of inherited and de novo copy number variants in a Plasmodium falciparum genetic cross.

BMC Genomics. 2011 Sep 22;12:457. doi: 10.1186/1471-2164-12-457.

Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.

PLoS One. 2011;6(8):e23204. doi: 10.1371/journal.pone.0023204. Epub 2011 Aug 11.

Systematic bias in high-throughput sequencing data and its correction by BEADS.

Nucleic Acids Res. 2011 Aug;39(15):e103. doi: 10.1093/nar/gkr425. Epub 2011 Jun 6.

High recombination rates and hotspots in a Plasmodium falciparum genetic cross.

Genome Biol. 2011;12(4):R33. doi: 10.1186/gb-2011-12-4-r33. Epub 2011 Apr 4.

Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization.

Bioinformatics. 2011 Jan 15;27(2):268-9. doi: 10.1093/bioinformatics/btq635. Epub 2010 Nov 15.

Structural variation in the human genome and its role in disease.

Annu Rev Med. 2010;61:437-55. doi: 10.1146/annurev-med-100708-204735.

Estimation of T-cell repertoire diversity and clonal size distribution by Poisson abundance models.

J Immunol Methods. 2010 Feb 28;353(1-2):124-37. doi: 10.1016/j.jim.2009.11.009. Epub 2009 Nov 18.

Computational methods for discovering structural variation with next-generation sequencing.

Nat Methods. 2009 Nov;6(11 Suppl):S13-20. doi: 10.1038/nmeth.1374.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

泊松层级建模方法在序列覆盖数据中检测拷贝数变异。

A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献