Suppr超能文献

SeedHit:一种对GPU友好的预比对过滤算法。

SeedHit: A GPU Friendly Pre-Align Filtering Algorithm.

作者信息

Ju Zhen, Zhang Jingjing, Li Xuelei, Meng Jintao, Wei Yanjie

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1794-1802. doi: 10.1109/TCBB.2024.3417517. Epub 2024 Dec 10.

Abstract

The amount of genetic data generated by Next Generation Sequencing (NGS) technologies grows faster than Moore's law. This necessitates the development of efficient NGS data processing and analysis algorithms. A filter before the computationally-costly analysis step can significantly reduce the run time of the NGS data analysis. As GPUs are orders of magnitude more powerful than CPUs, this paper proposes a GPU-friendly pre-align filtering algorithm named SeedHit for the fast processing of NGS data. Inspired by BLAST, SeedHit counts seed hits between two sequences to determine their similarity. In SeedHit, a nucleic acid in a gene sequence is presented in binary format. By packaging data and generating a lookup table that fits into the L1 cache, SeedHit is GPU-friendly and high-throughput. Using three 16 s rRNA datasets from Greengenes as input SeedHit can reject 84%-89% dissimilar sequence pairs on average when the similarity is 0.9-0.99. The throughput of SeedHit achieved 1 T/s (Tera base per second) on 3080 Ti. Compared with the other two GPU-based filtering algorithms, GateKeeper and SneakySnake, SeedHit has the highest rejection rate and throughput. By incorporating SeedHit into our in-house clustering algorithm nGIA, the modified nGIA achieved a 1.6-2.1 times speedup compared to the original version.

摘要

下一代测序(NGS)技术产生的遗传数据量的增长速度超过了摩尔定律。这就需要开发高效的NGS数据处理和分析算法。在计算成本高昂的分析步骤之前进行过滤,可以显著减少NGS数据分析的运行时间。由于图形处理器(GPU)比中央处理器(CPU)强大几个数量级,本文提出了一种名为SeedHit的对GPU友好的预比对过滤算法,用于快速处理NGS数据。受基本局部比对搜索工具(BLAST)启发,SeedHit通过计算两条序列之间的种子匹配数来确定它们的相似性。在SeedHit中,基因序列中的核酸以二进制格式呈现。通过打包数据并生成适合L1高速缓存的查找表,SeedHit对GPU友好且具有高通量。以来自Greengenes的三个16 s核糖体RNA(rRNA)数据集为输入,当相似度为0.9 - 0.99时,SeedHit平均可以拒绝84% - 89%的不相似序列对。SeedHit在3080 Ti上的通量达到了1太字节每秒(T/s)。与其他两种基于GPU的过滤算法GateKeeper和SneakySnake相比,SeedHit具有最高的拒绝率和通量。通过将SeedHit纳入我们内部的聚类算法nGIA,改进后的nGIA与原始版本相比实现了1.6至2.1倍的加速。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验