Suppr超能文献

使用连接读长测序数据鉴定结构变异体。

Identifying structural variants using linked-read sequencing data.

作者信息

Elyanow Rebecca, Wu Hsin-Ta, Raphael Benjamin J

机构信息

Center for Computational Molecular Biology, Brown University, Providence, RI, USA.

Department of Computer Science, Princeton University, Princeton, NJ, USA.

出版信息

Bioinformatics. 2018 Jan 15;34(2):353-360. doi: 10.1093/bioinformatics/btx712.

Abstract

MOTIVATION

Structural variation, including large deletions, duplications, inversions, translocations and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints in repetitive regions of the genome and thus are difficult to identify with short reads. The recently developed linked-read sequencing technology from 10X Genomics combines a novel barcoding strategy with Illumina sequencing. This technology labels all reads that originate from a small number (∼5 to 10) DNA molecules ∼50 Kbp in length with the same molecular barcode. These barcoded reads contain long-range sequence information that is advantageous for identification of structural variants.

RESULTS

We present Novel Adjacency Identification with Barcoded Reads (NAIBR), an algorithm to identify structural variants in linked-read sequencing data. NAIBR predicts novel adjacencies in an individual genome resulting from structural variants using a probabilistic model that combines multiple signals in barcoded reads. We show that NAIBR outperforms several existing methods for structural variant identification-including two recent methods that also analyze linked-reads-on simulated sequencing data and 10X whole-genome sequencing data from the NA12878 human genome and the HCC1954 breast cancer cell line. Several of the novel somatic structural variants identified in HCC1954 overlap known cancer genes.

AVAILABILITY AND IMPLEMENTATION

Software is available at compbio.cs.brown.edu/software.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

结构变异,包括大片段缺失、重复、倒位、易位和其他重排,在人类基因组和癌症基因组中很常见。已经开发了许多方法来从Illumina短读长测序数据中识别结构变异。然而,可靠地识别结构变异仍然具有挑战性,因为许多变异在基因组的重复区域具有断点,因此难以用短读长进行识别。10X Genomics最近开发的连接读长测序技术将一种新颖的条形码策略与Illumina测序相结合。该技术用相同的分子条形码标记所有来自少数(约5至10个)长度约为50 Kbp的DNA分子的读长。这些带条形码的读长包含有利于识别结构变异的长程序列信息。

结果

我们提出了带条形码读长的新型邻接识别算法(NAIBR),这是一种用于识别连接读长测序数据中结构变异的算法。NAIBR使用一种概率模型来预测个体基因组中由结构变异导致的新型邻接,该模型结合了带条形码读长中的多种信号。我们表明,NAIBR在结构变异识别方面优于几种现有方法,包括两种最近也分析连接读长的方法,在模拟测序数据以及来自NA12878人类基因组和HCC1954乳腺癌细胞系的10X全基因组测序数据上。在HCC1954中鉴定出的几个新型体细胞结构变异与已知的癌症基因重叠。

可用性和实现方式

软件可在compbio.cs.brown.edu/software获取。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

2
Structural variation analysis with strobe reads.使用 strobe reads 进行结构变异分析。
Bioinformatics. 2010 May 15;26(10):1291-8. doi: 10.1093/bioinformatics/btq153. Epub 2010 Apr 8.
3
SVIM: structural variant identification using mapped long reads.SVIM:基于比对的长读段的结构变异识别。
Bioinformatics. 2019 Sep 1;35(17):2907-2915. doi: 10.1093/bioinformatics/btz041.
10
Reconstructing cancer genomes from paired-end sequencing data.从配对末端测序数据中重建癌症基因组。
BMC Bioinformatics. 2012 Apr 19;13 Suppl 6(Suppl 6):S10. doi: 10.1186/1471-2105-13-S6-S10.

引用本文的文献

8
The Bioinformatic Applications of Hi-C and Linked Reads.Hi-C 和链接读取的生物信息学应用。
Genomics Proteomics Bioinformatics. 2024 Oct 15;22(4). doi: 10.1093/gpbjnl/qzae048.

本文引用的文献

3
Direct determination of diploid genome sequences.二倍体基因组序列的直接测定。
Genome Res. 2017 May;27(5):757-767. doi: 10.1101/gr.214874.116. Epub 2017 Apr 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验