Suppr超能文献

SOLiD 5500xl 系统测序 - GC 偏倚的深入特征分析。

Sequencing on the SOLiD 5500xl System - in-depth characterization of the GC bias.

机构信息

a Department of Translational Research in Psychiatry , Max Planck Institute of Psychiatry , Munich , Germany.

b Department of Stress Neurobiology and Neurogenetics , Munich , Germany.

出版信息

Nucleus. 2017 Jul 4;8(4):370-380. doi: 10.1080/19491034.2017.1320461. Epub 2017 Apr 27.

Abstract

Different types of sequencing biases have been described and subsequently improved for a variety of sequencing systems, mostly focusing on the widely used Illumina systems. Similar studies are missing for the SOLiD 5500xl system, a sequencer which produced many data sets available to researchers today. Describing and understanding the bias is important to accurately interpret and integrate these published data in various ongoing research projects. We report a particularly strong GC bias for this sequencing system when analyzing a defined gDNA mix of 5 microbes with a wide range of different GC contents (20-72%) when comparing to the expected distribution and Illumina MiSeq data from the same DNA pool. Since we observed this bias already under PCR-free conditions, changing the PCR conditions during library preparation - a common strategy to handle bias in the Illumina system - was not relevant. Source of the bias appeared to be an uneven heat distribution during the SOLiD emulsion PCR (ePCR) - for enrichment of libraries prior loading - since ePCR in either small pouches or in 96-well plates improved the GC bias. Sequencing of chromatin immunoprecipitated DNA (ChIP-seq) is a common approach in epigenetics. ChIP-seq of the mixed source histone mark H3K9ac (acetyl Histone H3 lysine 9), typically found on promoter regions and on gene bodies, including CpG islands, performed on a SOLiD 5500xl machine, resulted in major loss of reads at GC rich loci (GC content ≥ 62%), not explained by low sequencing depth. This was improved with adaptations of the ePCR.

摘要

不同类型的测序偏差已被描述,并随后针对各种测序系统(主要集中在广泛使用的 Illumina 系统)进行了改进。类似的研究在 SOLiD 5500xl 系统中缺失,该系统产生了许多当今可供研究人员使用的数据集。描述和理解偏差对于准确解释和整合各种正在进行的研究项目中的这些已发表数据非常重要。当分析一个由 5 种具有广泛不同 GC 含量(20-72%)的微生物组成的定义 gDNA 混合物时,与预期分布和来自同一 DNA 池的 Illumina MiSeq 数据相比,我们报告了该测序系统特别强烈的 GC 偏差。由于我们已经在无 PCR 条件下观察到这种偏差,因此在文库制备过程中改变 PCR 条件(Illumina 系统中处理偏差的常用策略)并不相关。偏差的来源似乎是 SOLiD 乳液 PCR(ePCR)过程中不均匀的热分布-用于在加载之前富集文库-因为无论是在小袋中还是在 96 孔板中进行 ePCR 都可以改善 GC 偏差。染色质免疫沉淀 DNA(ChIP-seq)测序是表观遗传学中的一种常见方法。混合来源组蛋白标记 H3K9ac(乙酰化组蛋白 H3 赖氨酸 9)的 ChIP-seq 通常在启动子区域和基因体上发现,包括 CpG 岛,在 SOLiD 5500xl 仪器上进行,导致富含 GC 的基因座(GC 含量≥62%)的读取大量丢失,无法用低测序深度来解释。通过 ePCR 的改编可以改善这种情况。

相似文献

1
Sequencing on the SOLiD 5500xl System - in-depth characterization of the GC bias.
Nucleus. 2017 Jul 4;8(4):370-380. doi: 10.1080/19491034.2017.1320461. Epub 2017 Apr 27.
6
Illumina Library Preparation for Sequencing the GC-Rich Fraction of Heterogeneous Genomic DNA.
Genome Biol Evol. 2018 Feb 1;10(2):616-622. doi: 10.1093/gbe/evy022.
7
Library preparation methods for next-generation sequencing: tone down the bias.
Exp Cell Res. 2014 Mar 10;322(1):12-20. doi: 10.1016/j.yexcr.2014.01.008. Epub 2014 Jan 15.
9
Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries.
Genome Biol. 2011;12(2):R18. doi: 10.1186/gb-2011-12-2-r18. Epub 2011 Feb 21.
10
GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms.
Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa008.

引用本文的文献

1
Next-Generation Sequencing and Triple-Negative Breast Cancer: Insights and Applications.
Int J Mol Sci. 2023 Jun 2;24(11):9688. doi: 10.3390/ijms24119688.
2
Application of third-generation sequencing to herbal genomics.
Front Plant Sci. 2023 Mar 7;14:1124536. doi: 10.3389/fpls.2023.1124536. eCollection 2023.

本文引用的文献

2
A Comprehensive Analysis of Cell Type-Specific Nuclear RNA From Neurons and Glia of the Brain.
Biol Psychiatry. 2017 Feb 1;81(3):252-264. doi: 10.1016/j.biopsych.2016.02.021. Epub 2016 Feb 24.
3
Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation.
Brief Bioinform. 2017 Mar 1;18(2):279-290. doi: 10.1093/bib/bbw023.
4
The PsychENCODE project.
Nat Neurosci. 2015 Dec;18(12):1707-12. doi: 10.1038/nn.4156.
5
6
Sources of PCR-induced distortions in high-throughput sequencing data sets.
Nucleic Acids Res. 2015 Dec 2;43(21):e143. doi: 10.1093/nar/gkv717. Epub 2015 Jul 17.
7
Neuronal Kmt2a/Mll1 histone methyltransferase is essential for prefrontal synaptic plasticity and working memory.
J Neurosci. 2015 Apr 1;35(13):5097-108. doi: 10.1523/JNEUROSCI.3004-14.2015.
8
Sequencing depth and coverage: key considerations in genomic analyses.
Nat Rev Genet. 2014 Feb;15(2):121-32. doi: 10.1038/nrg3642.
9
The next-generation sequencing revolution and its impact on genomics.
Cell. 2013 Sep 26;155(1):27-38. doi: 10.1016/j.cell.2013.09.006.
10
Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.
PLoS One. 2013 Jun 11;8(6):e66621. doi: 10.1371/journal.pone.0066621. Print 2013.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验