Costa Raquel L, Gadelha Luiz, Ribeiro-Alves Marcelo, Porto Fábio
DEXL Lab, National Laboratory for Scientific Computing (LNCC), Petrópolis, Rio de Janeiro, Brazil.
National Institute of Cancer (INCA), Rio de Janeiro, RJ, Brazil.
PeerJ. 2017 Jul 5;5:e3509. doi: 10.7717/peerj.3509. eCollection 2017.
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.
从原始数据的获取到选择一组能解释科学假设的代表性基因子集,转录组数据分析包含多个步骤。所产生的数据可以表示为基因间相互作用的网络,并且这些网络还可以与其他生物数据库整合,如蛋白质 - 蛋白质相互作用数据库、转录因子数据库和基因注释数据库。然而,这些分析结果仍然是零散的,这给结果的后续检查或通过纳入新的相关数据进行元分析带来了困难。将数据库和工具集成到科学工作流程中,编排它们的执行,并管理产生的数据及其各自的元数据是具有挑战性的任务。此外,同样需要大量的努力来运行计算机模拟实验,以便根据分析需要构建和整理信息。在实验周期中可能需要应用不同的程序并生成不同的文件。在这种情况下,一个支持实验执行的平台至关重要。我们展示了GeNNet,这是一个集成的转录组分析平台,它将科学工作流程与图形数据库统一起来,以便根据评估的生物系统选择相关基因。它包括GeNNet - Wf,一个科学工作流程,该流程预加载生物数据,预处理原始微阵列数据,并进行一系列分析,包括标准化、差异表达推断、聚类和基因集富集分析。一个用户友好的网页界面GeNNet - Web允许设置参数、执行并可视化GeNNet - Wf执行的结果。为了展示GeNNet的功能,我们使用从GEO检索的数据进行了案例研究,特别是在不同的分析场景中使用单因素实验。结果,我们获得了进行生物学功能分析的差异表达基因。这些结果被整合到GeNNet - DB中,这是一个关于基因、聚类、实验及其属性和关系的数据库。通过查询探索生成的图形数据库,这些查询展示了该数据模型在推理基因相互作用网络方面的表现力。GeNNet是第一个将转录组数据分析过程与图形数据库集成的平台。它提供了一整套工具,否则非专业用户安装和使用这些工具将具有挑战性。开发者可以向GeNNet的组件添加新功能。派生数据允许检验关于一个实验的先前假设,并通过交互式图形数据库环境探索新的假设。它能够分析来自Affymetrix平台的关于人类、恒河猴、小鼠和大鼠的不同数据。GeNNet作为一个开源平台可在https://github.com/raquele/GeNNet获取,并且可以使用命令docker pull quelopes/gennet作为软件容器检索。