Centre de Recherche du CHU de Québec-Université, Laval, Université Laval, G1V 4G2, Québec, Canada.
Département de Médecine Moléculaire, G1V 0A6, Québec, Canada.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad348.
The emergence of massive datasets exploring the multiple levels of molecular biology has made their analysis and knowledge transfer more complex. Flexible tools to manage big biological datasets could be of great help for standardizing the usage of developed data visualizations and integration methods. Business intelligence (BI) tools have been used in many fields as exploratory tools. They have numerous connectors to link numerous data repositories with a unified graphic interface, offering an overview of data and facilitating interpretation for decision makers. BI tools could be a flexible and user-friendly way of handling molecular biological data with interactive visualizations. However, it is rather uncommon to see such tools used for the exploration of massive and complex datasets in biological fields. We believe that two main obstacles could be the reason. Firstly, we posit that the way to import data into BI tools are not compatible with biological databases. Secondly, BI tools may not be adapted to certain particularities of complex biological data, namely, the size, the variability of datasets and the availability of specialized visualizations. This paper highlights the use of five BI tools (Elastic Kibana, Siren Investigate, Microsoft Power BI, Salesforce Tableau and Apache Superset) onto which the massive data management repository engine called Elasticsearch is compatible. Four case studies will be discussed in which these BI tools were applied on biological datasets with different characteristics. We conclude that the performance of the tools depends on the complexity of the biological questions and the size of the datasets.
大规模数据集的出现,探索了多层次的分子生物学,使得它们的分析和知识转移更加复杂。灵活的工具来管理大型生物数据集对于标准化开发的数据可视化和集成方法的使用可能会有很大的帮助。商业智能(BI)工具已在许多领域中用作探索工具。它们具有许多连接器,可以将许多数据存储库与统一的图形界面连接起来,提供数据概览并为决策者提供解释。BI 工具可以是一种灵活且用户友好的处理具有交互可视化的分子生物学数据的方法。然而,在生物领域中,很少看到这样的工具用于探索大规模和复杂的数据集。我们认为,有两个主要障碍可能是原因。首先,我们假设将数据导入 BI 工具的方式与生物数据库不兼容。其次,BI 工具可能不适应复杂生物数据的某些特殊性,即数据集的大小、可变性和专门可视化的可用性。本文重点介绍了可与称为 Elasticsearch 的大规模数据管理存储库引擎兼容的五个 BI 工具(Elastic Kibana、Siren Investigate、Microsoft Power BI、Salesforce Tableau 和 Apache Superset)的使用。将讨论四个案例研究,其中这些 BI 工具应用于具有不同特征的生物数据集。我们得出的结论是,工具的性能取决于生物问题的复杂性和数据集的大小。