与下一代测序检测拷贝数变异相关的统计挑战。

Statistical challenges associated with detecting copy number variations with next-generation sequencing.

机构信息

Saw Swee Hock School of Public Health, National University of Singapore, Singapore 117597.

出版信息

Bioinformatics. 2012 Nov 1;28(21):2711-8. doi: 10.1093/bioinformatics/bts535. Epub 2012 Aug 31.

DOI:10.1093/bioinformatics/bts535

PMID:22942022

Abstract

MOTIVATION

Analysing next-generation sequencing (NGS) data for copy number variations (CNVs) detection is a relatively new and challenging field, with no accepted standard protocols or quality control measures so far. There are by now several algorithms developed for each of the four broad methods for CNV detection using NGS, namely the depth of coverage (DOC), read-pair, split-read and assembly-based methods. However, because of the complexity of the genome and the short read lengths from NGS technology, there are still many challenges associated with the analysis of NGS data for CNVs, no matter which method or algorithm is used.

RESULTS

In this review, we describe and discuss areas of potential biases in CNV detection for each of the four methods. In particular, we focus on issues pertaining to (i) mappability, (ii) GC-content bias, (iii) quality control measures of reads and (iv) difficulty in identifying duplications. To gain insights to some of the issues discussed, we also download real data from the 1000 Genomes Project and analyse its DOC data. We show examples of how reads in repeated regions can affect CNV detection, demonstrate current GC-correction algorithms, investigate sensitivity of DOC algorithm before and after quality control of reads and discuss reasons for which duplications are harder to detect than deletions.

摘要

动机

分析下一代测序（NGS）数据以检测拷贝数变异（CNVs）是一个相对较新且具有挑战性的领域，到目前为止还没有公认的标准协议或质量控制措施。现在已经为使用 NGS 检测 CNV 的四种广泛方法中的每一种开发了几种算法，即深度覆盖（DOC）、读对、分读和基于组装的方法。然而，由于基因组的复杂性和 NGS 技术的短读长，无论使用哪种方法或算法，分析 NGS 数据进行 CNVs 检测仍然存在许多挑战。

结果

在这篇综述中，我们描述并讨论了这四种方法中每种方法在 CNV 检测中潜在偏倚的领域。特别是，我们关注与以下方面相关的问题：（i）可映射性，（ii）GC 含量偏倚，（iii）reads 的质量控制措施，以及（iv）识别重复的困难。为了深入了解讨论的一些问题，我们还从 1000 基因组计划下载真实数据并分析其 DOC 数据。我们展示了重复区域中的reads 如何影响 CNV 检测的示例，演示了当前的 GC 校正算法，研究了读取质量控制前后 DOC 算法的灵敏度，并讨论了为什么重复比缺失更难检测的原因。

相似文献

Statistical challenges associated with detecting copy number variations with next-generation sequencing.

Bioinformatics. 2012 Nov 1;28(21):2711-8. doi: 10.1093/bioinformatics/bts535. Epub 2012 Aug 31.

CNV-RF Is a Random Forest-Based Copy Number Variation Detection Method Using Next-Generation Sequencing.

J Mol Diagn. 2016 Nov;18(6):872-881. doi: 10.1016/j.jmoldx.2016.07.001. Epub 2016 Sep 3.

Noise cancellation using total variation for copy number variation detection.

BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):361. doi: 10.1186/s12859-018-2332-x.

Copy Number Variations Detection: Unravelling the Problem in Tangible Aspects.

IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1237-1250. doi: 10.1109/TCBB.2016.2576441. Epub 2016 Jun 7.

CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data.

PLoS One. 2015 Aug 20;10(8):e0135895. doi: 10.1371/journal.pone.0135895. eCollection 2015.

Exome sequence read depth methods for identifying copy number changes.

Brief Bioinform. 2015 May;16(3):380-92. doi: 10.1093/bib/bbu027. Epub 2014 Aug 28.

MGP-HMM: Detecting genome-wide CNVs using an HMM for modeling mate pair insertion sizes and read counts.

Math Biosci. 2016 Sep;279:53-62. doi: 10.1016/j.mbs.2016.07.006. Epub 2016 Jul 16.

De novo detection of copy number variation by co-assembly.

Bioinformatics. 2012 Dec 15;28(24):3195-202. doi: 10.1093/bioinformatics/bts601. Epub 2012 Oct 9.

Effective normalization for copy number variation detection from whole genome sequencing.

BMC Genomics. 2012;13 Suppl 6(Suppl 6):S16. doi: 10.1186/1471-2164-13-S6-S16. Epub 2012 Oct 26.

iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization.

PLoS One. 2018 Apr 5;13(4):e0195334. doi: 10.1371/journal.pone.0195334. eCollection 2018.

引用本文的文献

Detection and functional assessment of structural variants using whole-genome re-sequencing data in Nellore cattle.

Sci Rep. 2025 Aug 19;15(1):30364. doi: 10.1038/s41598-025-14139-0.

The CNV map construction and ROH analysis of Pinan cattle.

BMC Genomics. 2025 May 14;26(1):480. doi: 10.1186/s12864-025-11626-6.

Benchmarking of germline copy number variant callers from whole genome sequencing data for clinical applications.

Bioinform Adv. 2025 Apr 10;5(1):vbaf071. doi: 10.1093/bioadv/vbaf071. eCollection 2025.

Benchmarking strategies for CNV calling from whole genome bisulfite data in humans.

Comput Struct Biotechnol J. 2025 Mar 6;27:912-919. doi: 10.1016/j.csbj.2025.02.040. eCollection 2025.

Multi-tool copy number detection highlights common body size-associated variants in miniature pig breeds from different geographical regions.

BMC Genomics. 2025 Mar 22;26(1):285. doi: 10.1186/s12864-025-11446-8.

The genetic puzzle of multicopy genes: challenges and troubleshooting.

Plant Methods. 2025 Mar 7;21(1):32. doi: 10.1186/s13007-025-01329-0.

Detecting gene copy number alterations by Oncomine Comprehensive genomic profiling in a comparative study on FFPE tumor samples.

Sci Rep. 2025 Feb 5;15(1):4314. doi: 10.1038/s41598-025-88494-3.

Regional Hereditary Cancer Program in Chile: A scalable model of genetic counseling and molecular diagnosis to improve clinical outcomes for patients with hereditary cancer across Latin America.

Biol Res. 2024 Dec 23;57(1):99. doi: 10.1186/s40659-024-00579-x.

A comprehensive map of copy number variations in dromedary camels based on whole genome sequence data.

Sci Rep. 2024 Oct 26;14(1):25573. doi: 10.1038/s41598-024-77773-0.

Fc gamma receptors: Their evolution, genomic architecture, genetic variation, and impact on human disease.

Immunol Rev. 2024 Nov;328(1):65-97. doi: 10.1111/imr.13401. Epub 2024 Sep 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

与下一代测序检测拷贝数变异相关的统计挑战。

Statistical challenges associated with detecting copy number variations with next-generation sequencing.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献