Visentin Luca, Scarpellino Giorgia, Chinigò Giorgia, Munaron Luca, Ruffinatti Federico Alessandro
Department of Life Sciences and Systems Biology, University of Turin, 10123 Turin, Italy.
Biology (Basel). 2022 Sep 13;11(9):1346. doi: 10.3390/biology11091346.
Tens of thousands of gene expression data sets describing a variety of model organisms in many different pathophysiological conditions are currently stored in publicly available databases such as the Gene Expression Omnibus (GEO) and ArrayExpress (AE). As microarray technology is giving way to RNA-seq, it becomes strategic to develop high-level tools of analysis to preserve access to this huge amount of information through the most sophisticated methods of data preparation and processing developed over the years, while ensuring, at the same time, the reproducibility of the results. To meet this need, here we present bioTEA (biological Transcript Expression Analyzer), a novel software tool that combines ease of use with the versatility and power of an R/Bioconductor-based differential expression analysis, starting from raw data retrieval and preparation to gene annotation. BioTEA is an R-coded pipeline, wrapped in a Python-based command line interface and containerized with Docker technology. The user can choose among multiple options-including gene filtering, batch effect handling, sample pairing, statistical test type-to adapt the algorithm flow to the structure of the particular data set. All these options are saved in a single text file, which can be easily shared between different laboratories to deterministically reproduce the results. In addition, a detailed log file provides accurate information about each step of the analysis. Overall, these features make bioTEA an invaluable tool for both bioinformaticians and wet-lab biologists interested in transcriptomics. BioTEA is free and open-source.
目前,数万个描述多种模式生物在许多不同病理生理条件下的基因表达数据集存储在诸如基因表达综合数据库(GEO)和ArrayExpress(AE)等公共数据库中。随着微阵列技术逐渐被RNA测序所取代,开发高级分析工具变得至关重要,以便通过多年来开发的最复杂的数据准备和处理方法来保留对这些海量信息的访问权限,同时确保结果的可重复性。为满足这一需求,我们在此展示bioTEA(生物转录本表达分析仪),这是一种新型软件工具,从原始数据检索和准备到基因注释,将易用性与基于R/Bioconductor的差异表达分析的多功能性和强大功能相结合。BioTEA是一个用R编码的管道,包装在基于Python的命令行界面中,并通过Docker技术进行容器化。用户可以在多个选项中进行选择,包括基因过滤、批次效应处理、样本配对、统计测试类型,以使算法流程适应特定数据集的结构。所有这些选项都保存在一个文本文件中,可以在不同实验室之间轻松共享,以确定性地重现结果。此外,一个详细的日志文件提供了有关分析每个步骤的准确信息。总体而言,这些特性使bioTEA成为对转录组学感兴趣的生物信息学家和湿实验室生物学家的宝贵工具。BioTEA是免费且开源的。