Gregory Thomas, Ngankeu Apollinaire, Orwick Shelley, Kautto Esko A, Woyach Jennifer A, Byrd John C, Blachly James S
Division of Hematology, Ohio State University, Columbus, OH 43210, USA.
NAR Genom Bioinform. 2020 Dec;2(4):lqaa070. doi: 10.1093/nargab/lqaa070. Epub 2020 Oct 2.
High-throughput short-read sequencing relies on fragmented DNA for optimal sampling of input nucleic acid. Several vendors now offer proprietary enzyme cocktails as a cheaper and more streamlined method of fragmentation when compared to acoustic shearing. We have discovered that these enzymes induce the formation of library molecules containing regions of nearby DNA from opposite strands. Sequencing reads derived from these molecules can lead to artifact-derived variant calls appearing at variant allele frequencies <5%. We present Fragmentation Artifact Detection and Elimination (FADE), software to remove these artifacts from mapped reads and mitigate artifact-related effects on downstream analysis. We find that the artifacts principally affect downstream analyses that are sensitive to a 1-3% artifact bias in the sequencing reads, such as targeted resequencing and rare variant discovery.
高通量短读长测序依赖于片段化的DNA以实现对输入核酸的最佳采样。与声学剪切相比,现在有几家供应商提供专利酶混合物,作为一种更便宜、更简化的片段化方法。我们发现,这些酶会诱导形成包含来自相反链的附近DNA区域的文库分子。来自这些分子的测序读数可能导致在变异等位基因频率<5%时出现源自假象的变异调用。我们提出了片段化假象检测与消除(FADE)软件,用于从比对读数中去除这些假象,并减轻假象对下游分析的相关影响。我们发现,这些假象主要影响对测序读数中1-3%的假象偏差敏感的下游分析,如靶向重测序和罕见变异发现。