Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
School of Life Sciences, College of Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV, 89154, USA.
Genome Biol. 2022 Apr 28;23(1):108. doi: 10.1186/s13059-022-02670-6.
Despite recent improvements in basecalling accuracy, nanopore sequencing still has higher error rates on short-tandem repeats (STRs). Instead of using basecalled reads, we developed DeepRepeat which converts ionic current signals into red-green-blue channels, thus transforming the repeat detection problem into an image recognition problem. DeepRepeat identifies and accurately quantifies telomeric repeats in the CHM13 cell line and achieves higher accuracy in quantifying repeats in long STRs than competing methods. We also evaluate DeepRepeat on genome-wide or candidate region datasets from seven different sources. In summary, DeepRepeat enables accurate quantification of long STRs and complements existing methods relying on basecalled reads.
尽管碱基调用准确性最近有所提高,但纳米孔测序在短串联重复序列(STRs)上的错误率仍然较高。我们没有使用碱基调用的读数,而是开发了 DeepRepeat,它将离子电流信号转换为红-绿-蓝通道,从而将重复检测问题转化为图像识别问题。DeepRepeat 识别并准确量化了 CHM13 细胞系中的端粒重复序列,并比竞争方法在量化长 STR 重复序列方面实现了更高的准确性。我们还在来自七个不同来源的全基因组或候选区域数据集上评估了 DeepRepeat。总之,DeepRepeat 能够准确地量化长 STR,并补充了依赖碱基调用读数的现有方法。