Liu Yichen Henry, Grubbs Griffin L, Zhang Lu, Fang Xiaodong, Dill David L, Sidow Arend, Zhou Xin
Department of Computer Science, Vanderbilt University, Nashville, TN 37235, USA.
Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA.
Bioinform Adv. 2021 Jun 16;1(1):vbab007. doi: 10.1093/bioadv/vbab007. eCollection 2021.
Identifying structural variants (SVs) is critical in health and disease, however, detecting them remains a challenge. Several linked-read sequencing technologies, including 10X Genomics, TELL-Seq and single tube long fragment read (stLFR), have been recently developed as cost-effective approaches to reconstruct multi-megabase haplotypes (phase blocks) from sequence data of a single sample. These technologies provide an optimal sequencing platform to characterize SVs, though few computational algorithms can utilize them. Thus, we developed Aquila_stLFR, an approach that resolves SVs through haplotype-based assembly of stLFR linked-reads.
Aquila_stLFR first partitions long fragment reads into two haplotype-specific blocks with the assistance of the high-quality reference genome, by taking advantage of the potential phasing ability of the linked-read itself. Each haplotype is then assembled independently, to achieve a complete diploid assembly to finally reconstruct the genome-wide SVs. We benchmarked Aquila_stLFR on a well-studied sample, NA24385, and showed Aquila_stLFR can detect medium to large size deletions (50 bp-10 kb) with high sensitivity and medium-size insertions (50 bp-1 kb) with high specificity.
Source code and documentation are available on https://github.com/maiziex/Aquila_stLFR.
Supplementary data are available at online.
识别结构变异(SVs)在健康与疾病研究中至关重要,然而,检测它们仍然是一项挑战。最近已开发出几种连接读长测序技术,包括10X基因组学、TELL-Seq和单管长片段读长(stLFR),作为从单个样本的序列数据中重建多兆碱基单倍型(相位块)的经济高效方法。这些技术为表征SVs提供了一个最佳测序平台,尽管很少有计算算法能够利用它们。因此,我们开发了Aquila_stLFR,一种通过基于单倍型的stLFR连接读长组装来解析SVs的方法。
Aquila_stLFR首先借助高质量参考基因组,利用连接读长本身潜在的定相能力,将长片段读长划分为两个特定单倍型块。然后分别对每个单倍型进行组装,以实现完整的二倍体组装,最终重建全基因组的SVs。我们在经过充分研究的样本NA24385上对Aquila_stLFR进行了基准测试,结果表明Aquila_stLFR能够以高灵敏度检测中等至大尺寸的缺失(50 bp - 10 kb),并以高特异性检测中等尺寸的插入(50 bp - 1 kb)。
源代码和文档可在https://github.com/maiziex/Aquila_stLFR获取。
补充数据可在网上获取。