Suppr超能文献

Aquila_stLFR:用于stLFR连接 reads 的基于二倍体基因组组装的结构变异检测软件包。

Aquila_stLFR: diploid genome assembly based structural variant calling package for stLFR linked-reads.

作者信息

Liu Yichen Henry, Grubbs Griffin L, Zhang Lu, Fang Xiaodong, Dill David L, Sidow Arend, Zhou Xin

机构信息

Department of Computer Science, Vanderbilt University, Nashville, TN 37235, USA.

Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA.

出版信息

Bioinform Adv. 2021 Jun 16;1(1):vbab007. doi: 10.1093/bioadv/vbab007. eCollection 2021.

Abstract

MOTIVATION

Identifying structural variants (SVs) is critical in health and disease, however, detecting them remains a challenge. Several linked-read sequencing technologies, including 10X Genomics, TELL-Seq and single tube long fragment read (stLFR), have been recently developed as cost-effective approaches to reconstruct multi-megabase haplotypes (phase blocks) from sequence data of a single sample. These technologies provide an optimal sequencing platform to characterize SVs, though few computational algorithms can utilize them. Thus, we developed Aquila_stLFR, an approach that resolves SVs through haplotype-based assembly of stLFR linked-reads.

RESULTS

Aquila_stLFR first partitions long fragment reads into two haplotype-specific blocks with the assistance of the high-quality reference genome, by taking advantage of the potential phasing ability of the linked-read itself. Each haplotype is then assembled independently, to achieve a complete diploid assembly to finally reconstruct the genome-wide SVs. We benchmarked Aquila_stLFR on a well-studied sample, NA24385, and showed Aquila_stLFR can detect medium to large size deletions (50 bp-10 kb) with high sensitivity and medium-size insertions (50 bp-1 kb) with high specificity.

AVAILABILITY AND IMPLEMENTATION

Source code and documentation are available on https://github.com/maiziex/Aquila_stLFR.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

识别结构变异(SVs)在健康与疾病研究中至关重要,然而,检测它们仍然是一项挑战。最近已开发出几种连接读长测序技术,包括10X基因组学、TELL-Seq和单管长片段读长(stLFR),作为从单个样本的序列数据中重建多兆碱基单倍型(相位块)的经济高效方法。这些技术为表征SVs提供了一个最佳测序平台,尽管很少有计算算法能够利用它们。因此,我们开发了Aquila_stLFR,一种通过基于单倍型的stLFR连接读长组装来解析SVs的方法。

结果

Aquila_stLFR首先借助高质量参考基因组,利用连接读长本身潜在的定相能力,将长片段读长划分为两个特定单倍型块。然后分别对每个单倍型进行组装,以实现完整的二倍体组装,最终重建全基因组的SVs。我们在经过充分研究的样本NA24385上对Aquila_stLFR进行了基准测试,结果表明Aquila_stLFR能够以高灵敏度检测中等至大尺寸的缺失(50 bp - 10 kb),并以高特异性检测中等尺寸的插入(50 bp - 1 kb)。

可用性与实现方式

源代码和文档可在https://github.com/maiziex/Aquila_stLFR获取。

补充信息

补充数据可在网上获取。

相似文献

9
JTK: targeted diploid genome assembler.JTK:靶向二倍体基因组组装器。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad398.

本文引用的文献

4
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验