Liu Daniel
Torrey Pines High School, San Diego, CA, USA.
Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA.
PeerJ. 2019 Jun 19;7:e7170. doi: 10.7717/peerj.7170. eCollection 2019.
Next-generation sequencing technologies create large, multiplexed DNA sequences that require preprocessing before any further analysis. Part of this preprocessing includes demultiplexing and trimming sequences. Although there are many existing tools that can handle these preprocessing steps, they cannot be easily extended to new sequence schematics when new pipelines are developed. We present Fuzzysplit, a tool that relies on a simple declarative language to describe the schematics of sequences, which makes it incredibly adaptable to different use cases. In this paper, we explain the matching algorithms behind Fuzzysplit and we provide a preliminary comparison of its performance with other well-established tools. Overall, we find that its matching accuracy is comparable to previous tools.
新一代测序技术会生成大量的多重DNA序列,在进行任何进一步分析之前都需要进行预处理。这种预处理的一部分包括对序列进行解复用和修剪。尽管有许多现有工具可以处理这些预处理步骤,但在开发新的流程时,它们不容易扩展到新的序列方案。我们展示了Fuzzysplit,这是一种依赖简单声明性语言来描述序列方案的工具,这使得它非常适用于不同的用例。在本文中,我们解释了Fuzzysplit背后的匹配算法,并对其与其他成熟工具的性能进行了初步比较。总体而言,我们发现它的匹配精度与以前的工具相当。