Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, USA.
Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, Frederick, MD, 21702, USA.
BMC Bioinformatics. 2023 Jan 27;23(Suppl 8):568. doi: 10.1186/s12859-023-05139-w.
Structural variation (SV), which ranges from 50 bp to [Formula: see text] 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals.
In this paper, we develop a new method called EigenDel for detecting the germline submicroscopic genomic deletions. EigenDel first takes advantage of discordant read-pairs and clipped reads to get initial deletion candidates, and then it clusters similar candidates by using unsupervised learning methods. After that, EigenDel uses a carefully designed approach for calling true deletions from each cluster. We conduct various experiments to evaluate the performance of EigenDel on low coverage sequence data.
Our results show that EigenDel outperforms other major methods in terms of improving capability of balancing accuracy and sensitivity as well as reducing bias. EigenDel can be downloaded from https://github.com/lxwgcool/EigenDel .
结构变异(SV)大小范围从 50bp 到[公式:见正文]3Mb,是遗传变异的重要类型之一。缺失是一种 SV,其中在 DNA 复制过程中丢失了染色体的一部分或 DNA 序列。三种信号类型,包括不一致的读对、读深度和分裂读,通常用于从高通量序列数据中检测 SV。许多工具已经被开发出来,用于通过使用这些信号中的一个或多个来检测 SV。
在本文中,我们开发了一种称为 EigenDel 的新方法,用于检测种系亚微观基因组缺失。EigenDel 首先利用不一致的读对和剪接读来获得初始缺失候选者,然后通过使用无监督学习方法对相似的候选者进行聚类。之后,EigenDel 使用一种精心设计的方法从每个聚类中调用真正的缺失。我们进行了各种实验来评估 EigenDel 在低覆盖率序列数据上的性能。
我们的结果表明,EigenDel 在平衡准确性和敏感性的能力、减少偏差方面优于其他主要方法。EigenDel 可以从 https://github.com/lxwgcool/EigenDel 下载。