Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.
Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.
Bioinformatics. 2014 Sep 15;30(18):2559-67. doi: 10.1093/bioinformatics/btu360. Epub 2014 May 29.
Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information.
In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100×, it finds ∼ 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome.
The source code of SlideSort-BPR can be freely downloaded from https://code.google.com/p/slidesort-bpr/.
染色体重排事件是由 DNA 分子的非典型断裂和连接触发的,这种现象在许多与癌症相关的疾病中都有观察到。重排的检测通常是通过使用下一代测序(NGS)生成的短读序列,并结合参考基因组的知识来完成。由于结构变异和基因组在人与人之间存在差异,通过参考基因组进行中间比较可能会导致信息丢失。
在本文中,我们提出了一种无需参考基因组即可检测染色体重排中断点簇的方法。这是通过直接比较一组 NGS 正常读序列和另一组可能发生重排的读序列来实现的。我们的方法 SlideSort-BPR(断点读序列)基于一种快速算法,用于短读序列的两两比较,以及对相邻读序列数量的理论分析。当应用于测序深度为 100×的数据集时,它可以正确地找到约 88%的断点,且没有假阳性读序列。此外,在真实的前列腺癌数据集上的评估表明,与之前的方法相比,该方法预测更多的融合转录本是正确的,且产生的假阳性读序列更少。据我们所知,这是第一个无需使用参考基因组即可检测断点读序列的方法。
SlideSort-BPR 的源代码可从 https://code.google.com/p/slidesort-bpr/ 免费下载。