Suppr超能文献

PROTAX-GPU:一种用于 DNA 条形码的可扩展概率分类系统。

PROTAX-GPU: a scalable probabilistic taxonomic classification system for DNA barcodes.

机构信息

Vector Institute for Artificial Intelligence, Toronto, Canada M5G 0C6.

Department of Computer Science, University of Toronto, Toronto, Canada M5S 2E4.

出版信息

Philos Trans R Soc Lond B Biol Sci. 2024 Jun 24;379(1904):20230124. doi: 10.1098/rstb.2023.0124. Epub 2024 May 6.

Abstract

DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a probabilistic approach to taxonomic classification, advancing on methods that rely solely on sequence similarity. It provides calibrated probabilistic assignments to a partially populated taxonomic hierarchy, accounting for taxa that lack references and incorrect taxonomic annotation. While effective on smaller scales, global application of PROTAX necessitates substantially larger reference libraries, a goal previously hindered by computational barriers. We introduce PROTAX-GPU, a scalable algorithm capable of leveraging the global Barcode of Life Data System (>14 million specimens) as a reference database. Using graphics processing units (GPU) to accelerate similarity and nearest-neighbour operations and the JAX library for Python integration, we achieve over a 1000 × speedup compared with the central processing unit (CPU)-based implementation without compromising PROTAX's key benefits. PROTAX-GPU marks a significant stride towards real-time DNA barcoding, enabling quicker and more efficient species identification in environmental assessments. This capability opens up new avenues for real-time monitoring and analysis of biodiversity, advancing our ability to understand and respond to ecological dynamics. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.

摘要

基于 DNA 的鉴定对于分类生物样本至关重要,但量化基于序列的分类任务不确定性的方法却很少。挑战来自嘈杂的参考数据库,包括标记错误的条目和缺失的分类单元。PROTAX 通过一种概率方法来解决这些问题,对分类方法进行了改进,不仅依靠序列相似性。它为部分填充的分类层次结构提供了校准的概率分配,考虑到缺乏参考和不正确分类注释的分类单元。虽然在较小的范围内效果很好,但 PROTAX 的全球应用需要更大的参考库,这一目标以前受到计算障碍的阻碍。我们引入了 PROTAX-GPU,这是一种可扩展的算法,能够利用全球生命条形码数据系统(超过 1400 万标本)作为参考数据库。我们使用图形处理单元(GPU)来加速相似性和最近邻操作,并使用 Python 集成的 JAX 库,与基于中央处理单元(CPU)的实现相比,实现了超过 1000 倍的加速,而不会影响 PROTAX 的关键优势。PROTAX-GPU 标志着实时 DNA 条形码技术迈出了重要一步,使环境评估中的物种鉴定更快、更高效。这种能力为实时监测和分析生物多样性开辟了新途径,提高了我们理解和应对生态动态的能力。本文是主题为“迈向全球昆虫生物多样性监测工具包”的一部分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a040/11070247/e094454a9f0f/rstb20230124f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验