Suppr超能文献

利用短读长数据的Burrows-Wheeler变换进行超快速单核苷酸多态性分析。

Ultrafast SNP analysis using the Burrows-Wheeler transform of short-read data.

作者信息

Kimura Kouichi, Koike Asako

机构信息

Biosystems Research Department, Central Research Laboratory, Hitachi, Ltd., 1-280 Higashi-Koigakubo, Kokubunji, Tokyo 185-8601, Japan.

出版信息

Bioinformatics. 2015 May 15;31(10):1577-83. doi: 10.1093/bioinformatics/btv024. Epub 2015 Jan 20.

Abstract

MOTIVATION

Sequence-variation analysis is conventionally performed on mapping results that are highly redundant and occasionally contain undesirable heuristic biases. A straightforward approach to single-nucleotide polymorphism (SNP) analysis, using the Burrows-Wheeler transform (BWT) of short-read data, is proposed.

RESULTS

The BWT makes it possible to simultaneously process collections of read fragments of the same sequences; accordingly, SNPs were found from the BWT much faster than from the mapping results. It took only a few minutes to find SNPs from the BWT (with a supplementary data, fragment depth of coverage [FDC]) using a desktop workstation in the case of human exome or transcriptome sequencing data and 20 min using a dual-CPU server in the case of human genome sequencing data. The SNPs found with the proposed method almost agreed with those found by a time-consuming state-of-the-art tool, except for the cases in which the use of fragments of reads led to sensitivity loss or sequencing depth was not sufficient. These exceptions were predictable in advance on the basis of minimum length for uniqueness (MLU) and FDC defined on the reference genome. Moreover, BWT and FDC were computed in less time than it took to get the mapping results, provided that the data were large enough.

摘要

动机

序列变异分析通常是在高度冗余且偶尔包含不良启发式偏差的映射结果上进行的。本文提出了一种使用短读长数据的Burrows-Wheeler变换(BWT)进行单核苷酸多态性(SNP)分析的直接方法。

结果

BWT使得能够同时处理相同序列的读段集合;因此,从BWT中发现SNP的速度比从映射结果中快得多。对于人类外显子组或转录组测序数据,使用台式工作站从BWT(结合补充数据,片段覆盖深度[FDC])中发现SNP仅需几分钟,而对于人类基因组测序数据,使用双CPU服务器则需20分钟。除了使用读段片段导致灵敏度损失或测序深度不足的情况外,用所提出的方法发现的SNP与通过耗时的最新工具发现的SNP几乎一致。这些例外情况可以根据参考基因组上定义的唯一最小长度(MLU)和FDC预先预测。此外,只要数据量足够大,计算BWT和FDC所需的时间比获得映射结果的时间要少。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验