Department of Computer Science, Sapienza University of Rome, Viale Regina Elena 295, 00166, Rome, Italy.
Department of Health Technology, Technical University of Denmark, Anker Engelunds Vej 101, 2800, Kongens Lyngby, Denmark.
BMC Bioinformatics. 2024 Aug 21;25(1):272. doi: 10.1186/s12859-024-05887-3.
The availability of transcriptomic data for species without a reference genome enables the construction of de novo transcriptome assemblies as alternative reference resources from RNA-Seq data. A transcriptome provides direct information about a species' protein-coding genes under specific experimental conditions. The de novo assembly process produces a unigenes file in FASTA format, subsequently targeted for the annotation. Homology-based annotation, a method to infer the function of sequences by estimating similarity with other sequences in a reference database, is a computationally demanding procedure.
To mitigate the computational burden, we introduce HPC-T-Annotator, a tool for de novo transcriptome homology annotation on high performance computing (HPC) infrastructures, designed for straightforward configuration via a Web interface. Once the configuration data are given, the entire parallel computing software for annotation is automatically generated and can be launched on a supercomputer using a simple command line. The output data can then be easily viewed using post-processing utilities in the form of Python notebooks integrated in the proposed software.
HPC-T-Annotator expedites homology-based annotation in de novo transcriptome assemblies. Its efficient parallelization strategy on HPC infrastructures significantly reduces computational load and execution times, enabling large-scale transcriptome analysis and comparison projects, while its intuitive graphical interface extends accessibility to users without IT skills.
对于没有参考基因组的物种,转录组数据的可用性使得能够从 RNA-Seq 数据构建新的转录组组装作为替代参考资源。转录组提供了在特定实验条件下物种的蛋白质编码基因的直接信息。从头组装过程会生成 FASTA 格式的 unigenes 文件,然后针对注释进行靶向处理。基于同源性的注释是一种通过估计与参考数据库中其他序列的相似性来推断序列功能的方法,是一种计算密集型的过程。
为了减轻计算负担,我们引入了 HPC-T-Annotator,这是一种用于高性能计算 (HPC) 基础设施上的新转录组同源性注释的工具,通过 Web 界面进行简单配置。一旦提供了配置数据,整个注释的并行计算软件就会自动生成,并可以使用简单的命令行在超级计算机上启动。然后,可以使用以 Python 笔记本形式集成在提议软件中的后处理实用程序轻松查看输出数据。
HPC-T-Annotator 加速了新转录组组装中的基于同源性的注释。其在 HPC 基础设施上的高效并行化策略显著降低了计算负载和执行时间,使大规模转录组分析和比较项目成为可能,而其直观的图形界面则扩展了没有 IT 技能的用户的可访问性。