a Department of Translational Research in Psychiatry , Max Planck Institute of Psychiatry , Munich , Germany.
b Department of Stress Neurobiology and Neurogenetics , Munich , Germany.
Nucleus. 2017 Jul 4;8(4):370-380. doi: 10.1080/19491034.2017.1320461. Epub 2017 Apr 27.
Different types of sequencing biases have been described and subsequently improved for a variety of sequencing systems, mostly focusing on the widely used Illumina systems. Similar studies are missing for the SOLiD 5500xl system, a sequencer which produced many data sets available to researchers today. Describing and understanding the bias is important to accurately interpret and integrate these published data in various ongoing research projects. We report a particularly strong GC bias for this sequencing system when analyzing a defined gDNA mix of 5 microbes with a wide range of different GC contents (20-72%) when comparing to the expected distribution and Illumina MiSeq data from the same DNA pool. Since we observed this bias already under PCR-free conditions, changing the PCR conditions during library preparation - a common strategy to handle bias in the Illumina system - was not relevant. Source of the bias appeared to be an uneven heat distribution during the SOLiD emulsion PCR (ePCR) - for enrichment of libraries prior loading - since ePCR in either small pouches or in 96-well plates improved the GC bias. Sequencing of chromatin immunoprecipitated DNA (ChIP-seq) is a common approach in epigenetics. ChIP-seq of the mixed source histone mark H3K9ac (acetyl Histone H3 lysine 9), typically found on promoter regions and on gene bodies, including CpG islands, performed on a SOLiD 5500xl machine, resulted in major loss of reads at GC rich loci (GC content ≥ 62%), not explained by low sequencing depth. This was improved with adaptations of the ePCR.
不同类型的测序偏差已被描述,并随后针对各种测序系统(主要集中在广泛使用的 Illumina 系统)进行了改进。类似的研究在 SOLiD 5500xl 系统中缺失,该系统产生了许多当今可供研究人员使用的数据集。描述和理解偏差对于准确解释和整合各种正在进行的研究项目中的这些已发表数据非常重要。当分析一个由 5 种具有广泛不同 GC 含量(20-72%)的微生物组成的定义 gDNA 混合物时,与预期分布和来自同一 DNA 池的 Illumina MiSeq 数据相比,我们报告了该测序系统特别强烈的 GC 偏差。由于我们已经在无 PCR 条件下观察到这种偏差,因此在文库制备过程中改变 PCR 条件(Illumina 系统中处理偏差的常用策略)并不相关。偏差的来源似乎是 SOLiD 乳液 PCR(ePCR)过程中不均匀的热分布-用于在加载之前富集文库-因为无论是在小袋中还是在 96 孔板中进行 ePCR 都可以改善 GC 偏差。染色质免疫沉淀 DNA(ChIP-seq)测序是表观遗传学中的一种常见方法。混合来源组蛋白标记 H3K9ac(乙酰化组蛋白 H3 赖氨酸 9)的 ChIP-seq 通常在启动子区域和基因体上发现,包括 CpG 岛,在 SOLiD 5500xl 仪器上进行,导致富含 GC 的基因座(GC 含量≥62%)的读取大量丢失,无法用低测序深度来解释。通过 ePCR 的改编可以改善这种情况。