Functional Genomics Center Zurich (FGCZ), University of Zurich/ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland.
J Integr Bioinform. 2022 Sep 8;19(4). doi: 10.1515/jib-2022-0031. eCollection 2022 Dec 1.
Core facilities have to offer technologies that best serve the needs of their users and provide them a competitive advantage in research. They have to set up and maintain instruments in the range of ten to a hundred, which produce large amounts of data and serve thousands of active projects and customers. Particular emphasis has to be given to the reproducibility of the results. More and more, the entire process from building the research hypothesis, conducting the experiments, doing the measurements, through the data explorations and analysis is solely driven by very few experts in various scientific fields. Still, the ability to perform the entire data exploration in real-time on a personal computer is often hampered by the heterogeneity of software, the data structure formats of the output, and the enormous data sizes. These impact the design and architecture of the implemented software stack. At the Functional Genomics Center Zurich (FGCZ), a joint state-of-the-art research and training facility of ETH Zurich and the University of Zurich, we have developed the B-Fabric system, which has served for more than a decade, an entire life sciences community with fundamental data science support. In this paper, we sketch how such a system can be used to glue together data (including metadata), computing infrastructures (clusters and clouds), and visualization software to support instant data exploration and visual analysis. We illustrate our in-daily life implemented approach using visualization applications of mass spectrometry data.
核心设施必须提供最能满足用户需求的技术,并为他们的研究提供竞争优势。他们必须设置和维护十到一百种仪器,这些仪器产生大量数据,并为数千个活跃的项目和客户提供服务。特别需要强调结果的可重复性。越来越多的情况是,从构建研究假设、进行实验、进行测量,到数据探索和分析的整个过程完全由各个科学领域的少数专家主导。尽管如此,在个人计算机上实时执行整个数据探索的能力通常受到软件的异构性、输出数据结构格式以及巨大的数据大小的阻碍。这些因素影响了实施软件堆栈的设计和架构。在苏黎世联邦理工学院(ETH 苏黎世)和苏黎世大学的联合最先进的研究和培训设施——苏黎世功能基因组中心(FGCZ),我们开发了 B-Fabric 系统,该系统已经为整个生命科学社区提供了十多年的基础数据科学支持。在本文中,我们概述了如何使用这样的系统将数据(包括元数据)、计算基础设施(集群和云)和可视化软件组合在一起,以支持即时的数据探索和可视化分析。我们使用质谱数据的可视化应用程序来说明我们在日常生活中实施的方法。