Suppr超能文献

Sequana 覆盖度:使用移动中位数和混合模型检测和描述基因组变异。

Sequana coverage: detection and characterization of genomic variations using running median and mixture models.

机构信息

Institut Pasteur - Pole Biomics - 25-28 Rue du Docteur Roux, 75015 Paris, France.

Institut Pasteur - Bioinformatics and Biostatistics Hub - C3BI, USR 3756 IP CNRS - Paris, France.

出版信息

Gigascience. 2018 Dec 1;7(12):giy110. doi: 10.1093/gigascience/giy110.

Abstract

BACKGROUND

In addition to mapping quality information, the Genome coverage contains valuable biological information such as the presence of repetitive regions, deleted genes, or copy number variations (CNVs). It is essential to take into consideration atypical regions, trends (e.g., origin of replication), or known and unknown biases that influence coverage. It is also important that reported events have robust statistics (e.g. z-score) associated with their detections as well as precise location.

RESULTS

We provide a stand-alone application, sequana_coverage, that reports genomic regions of interest (ROIs) that are significantly over- or underrepresented in high-throughput sequencing data. Significance is associated with the events as well as characteristics such as length of the regions. The algorithm first detrends the data using an efficient running median algorithm. It then estimates the distribution of the normalized genome coverage with a Gaussian mixture model. Finally, a z-score statistic is assigned to each base position and used to separate the central distribution from the ROIs (i.e., under- and overcovered regions). A double thresholds mechanism is used to cluster the genomic ROIs. HTML reports provide a summary with interactive visual representations of the genomic ROIs with standard plots and metrics. Genomic variations such as single-nucleotide variants or CNVs can be effectively identified at the same time.

摘要

背景

除了映射质量信息外,基因组覆盖度还包含有价值的生物学信息,如重复区域、缺失基因或拷贝数变异(CNVs)的存在。考虑到影响覆盖度的非典型区域、趋势(例如复制起点)或已知和未知的偏差是至关重要的。同样重要的是,报告的事件具有与其检测相关的稳健统计数据(例如 z 分数)以及精确的位置。

结果

我们提供了一个独立的应用程序 sequana_coverage,用于报告在高通量测序数据中显著过表达或低表达的基因组感兴趣区域(ROI)。显著性与事件以及 ROI 的长度等特征相关联。该算法首先使用高效的移动中位数算法对数据进行去趋势处理。然后,它使用高斯混合模型估计归一化基因组覆盖度的分布。最后,为每个碱基位置分配 z 分数统计量,并将其用于将中央分布与 ROI(即覆盖不足和覆盖过度的区域)分开。使用双阈值机制对基因组 ROI 进行聚类。HTML 报告提供了带有基因组 ROI 的交互式可视化表示的摘要,以及标准的图表和指标。同时可以有效地识别基因组变异,如单核苷酸变异或 CNVs。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fd3/6275460/075d9b9f8280/giy110fig1.jpg

相似文献

2
Copy number variations in the genome of the Qatari population.
BMC Genomics. 2015 Oct 22;16:834. doi: 10.1186/s12864-015-1991-5.
3
Noise cancellation using total variation for copy number variation detection.
BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):361. doi: 10.1186/s12859-018-2332-x.
5
An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.
BMC Bioinformatics. 2017 May 31;18(1):286. doi: 10.1186/s12859-017-1705-x.
6
Detection of Copy Number Variation Regions Using the DNA-Sequencing Data from Multiple Profiles with Correlated Structure.
J Comput Biol. 2018 Oct;25(10):1128-1140. doi: 10.1089/cmb.2018.0053. Epub 2018 Jul 27.
7
SM-RCNV: a statistical method to detect recurrent copy number variations in sequenced samples.
Genes Genomics. 2019 May;41(5):529-536. doi: 10.1007/s13258-019-00788-9. Epub 2019 Feb 18.
8
iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization.
PLoS One. 2018 Apr 5;13(4):e0195334. doi: 10.1371/journal.pone.0195334. eCollection 2018.
9
Shape-based retrieval of CNV regions in read coverage data.
Int J Data Min Bioinform. 2014;9(3):254-76. doi: 10.1504/ijdmb.2014.060051.
10
Copy number variants in the sheep genome detected using multiple approaches.
BMC Genomics. 2016 Jun 8;17:441. doi: 10.1186/s12864-016-2754-7.

引用本文的文献

本文引用的文献

1
Bioconda: sustainable and comprehensive software distribution for the life sciences.
Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7.
2
Sequanix: a dynamic graphical interface for Snakemake workflows.
Bioinformatics. 2018 Jun 1;34(11):1934-1936. doi: 10.1093/bioinformatics/bty034.
3
Singularity: Scientific containers for mobility of compute.
PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017.
4
A Landscape of Pharmacogenomic Interactions in Cancer.
Cell. 2016 Jul 28;166(3):740-754. doi: 10.1016/j.cell.2016.06.017. Epub 2016 Jul 7.
5
MultiQC: summarize analysis results for multiple tools and samples in a single report.
Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16.
6
Coming of age: ten years of next-generation sequencing technologies.
Nat Rev Genet. 2016 May 17;17(6):333-51. doi: 10.1038/nrg.2016.49.
7
De novo meta-assembly of ultra-deep sequencing data.
Bioinformatics. 2015 Jun 15;31(12):i9-16. doi: 10.1093/bioinformatics/btv226.
8
CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data.
Bioinformatics. 2015 Jun 1;31(11):1708-15. doi: 10.1093/bioinformatics/btv070. Epub 2015 Feb 1.
10
Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives.
BMC Bioinformatics. 2013;14 Suppl 11(Suppl 11):S1. doi: 10.1186/1471-2105-14-S11-S1. Epub 2013 Sep 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验