Suppr超能文献

CNVkit:通过靶向DNA测序进行全基因组拷贝数检测与可视化

CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing.

作者信息

Talevich Eric, Shain A Hunter, Botton Thomas, Bastian Boris C

机构信息

Department of Dermatology, University of California, San Francisco, San Francisco, California, United States of America.

Department of Pathology, University of California, San Francisco, San Francisco, California, United States of America.

出版信息

PLoS Comput Biol. 2016 Apr 21;12(4):e1004873. doi: 10.1371/journal.pcbi.1004873. eCollection 2016 Apr.

Abstract

Germline copy number variants (CNVs) and somatic copy number alterations (SCNAs) are of significant importance in syndromic conditions and cancer. Massively parallel sequencing is increasingly used to infer copy number information from variations in the read depth in sequencing data. However, this approach has limitations in the case of targeted re-sequencing, which leaves gaps in coverage between the regions chosen for enrichment and introduces biases related to the efficiency of target capture and library preparation. We present a method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes. In particular, we successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes. After normalizing read counts to a pooled reference, we evaluated and corrected for three sources of bias that explain most of the extraneous variability in the sequencing read depth: GC content, target footprint size and spacing, and repetitive sequences. We compared the performance of CNVkit to copy number changes identified by array comparative genomic hybridization. We packaged the components of CNVkit so that it is straightforward to use and provides visualizations, detailed reporting of significant features, and export options for integration into existing analysis pipelines. CNVkit is freely available from https://github.com/etal/cnvkit.

摘要

种系拷贝数变异(CNV)和体细胞拷贝数改变(SCNA)在综合征性疾病和癌症中具有重要意义。大规模平行测序越来越多地用于从测序数据的读取深度变化中推断拷贝数信息。然而,这种方法在靶向重测序的情况下存在局限性,因为它会在富集区域之间留下覆盖空白,并引入与目标捕获和文库制备效率相关的偏差。我们提出了一种拷贝数检测方法,该方法在软件包CNVkit中实现,它使用靶向读取和非特异性捕获的脱靶读取来在全基因组范围内均匀推断拷贝数。这种组合在靶向区域实现了外显子水平的分辨率,在较大的内含子和基因间区域也具有足够的分辨率来识别拷贝数变化。特别是,我们成功地从一个仅靶向293个基因的平台上在全基因组范围内以相当于100千碱基的分辨率推断出拷贝数。在将读取计数归一化到一个合并的参考之后,我们评估并校正了解释测序读取深度中大部分无关变异的三个偏差来源:GC含量、目标足迹大小和间距以及重复序列。我们将CNVkit的性能与通过阵列比较基因组杂交鉴定的拷贝数变化进行了比较。我们对CNVkit的组件进行了打包,使其易于使用,并提供可视化、显著特征的详细报告以及用于集成到现有分析流程中的导出选项。CNVkit可从https://github.com/etal/cnvkit免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a00/4839673/3e99adee3912/pcbi.1004873.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验