Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia.
Department of Medical Biology, The University of Melbourne, 1G Royal Parade, Parkville, VIC 3052, Australia.
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab003.
The data produced by long-read third-generation sequencers have unique characteristics compared to short-read sequencing data, often requiring tailored analysis tools for tasks ranging from quality control to downstream processing. The rapid growth in software that addresses these challenges for different genomics applications is difficult to keep track of, which makes it hard for users to choose the most appropriate tool for their analysis goal and for developers to identify areas of need and existing solutions to benchmark against.
We describe the implementation of long-read-tools.org, an open-source database that organizes the rapidly expanding collection of long-read data analysis tools and allows its exploration through interactive browsing and filtering. The current database release contains 478 tools across 32 categories. Most tools are developed in Python, and the most frequent analysis tasks include base calling, de novo assembly, error correction, quality checking/filtering, and isoform detection, while long-read single-cell data analysis and transcriptomics are areas with the fewest tools available.
Continued growth in the application of long-read sequencing in genomics research positions the long-read-tools.org database as an essential resource that allows researchers to keep abreast of both established and emerging software to help guide the selection of the most relevant tool for their analysis needs.
与短读测序数据相比,长读第三代测序仪产生的数据具有独特的特征,通常需要针对从质量控制到下游处理等各种任务定制分析工具。针对不同基因组学应用解决这些挑战的软件迅速增长,很难跟踪,这使得用户难以选择最适合其分析目标的工具,也使得开发人员难以确定需求领域和现有的基准解决方案。
我们描述了 long-read-tools.org 的实现,这是一个开源数据库,它组织了快速扩展的长读数据分析工具集合,并允许通过交互式浏览和筛选来探索这些工具。当前数据库版本包含 32 个类别中的 478 个工具。大多数工具都是用 Python 开发的,最常见的分析任务包括碱基调用、从头组装、错误纠正、质量检查/过滤和异构体检测,而长读单细胞数据分析和转录组学是可用工具最少的领域。
长读测序在基因组学研究中的应用不断增长,使得 long-read-tools.org 数据库成为一个必不可少的资源,使研究人员能够跟上已建立和新兴软件的步伐,帮助指导他们选择最适合其分析需求的工具。