Heinz Jakob M, Meyerson Matthew, Li Heng
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.
Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, United States.
bioRxiv. 2025 Jul 18:2025.07.15.664946. doi: 10.1101/2025.07.15.664946.
Long-read sequencing data is useful for detecting large and complex structural variations; however, technical artifacts can lead to false structural variant calls. In our analyses, we became aware of a foldback artifact in long-read data. Therefore, we developed the open-source Breakinator tool to flag putative foldback artifact reads, as well as previously known chimeric artifacts. Through an alignment-based approach, Breakinator can detect artifacts missed by existing quality control tools. We profiled the occurrences of foldbacks and chimeric reads in both nanopore and single-molecule real-time sequences across a range of specimens, library types, sequencing chemistries, sequencing machines, and base-calling software.
长读长测序数据对于检测大型和复杂的结构变异很有用;然而,技术假象可能导致错误的结构变异调用。在我们的分析中,我们意识到长读长数据中存在一种回文假象。因此,我们开发了开源的Breakinator工具,以标记假定的回文假象 reads,以及先前已知的嵌合假象。通过基于比对的方法,Breakinator可以检测出现有质量控制工具遗漏的假象。我们分析了一系列样本、文库类型、测序化学方法、测序机器和碱基识别软件中纳米孔和单分子实时序列中的回文和嵌合 reads 的出现情况。