Verdi Kacy K, Ellis Heidi Jc, Gryk Michael R
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
BMC Bioinformatics. 2007 Jan 30;8:31. doi: 10.1186/1471-2105-8-31.
Scientific workflows improve the process of scientific experiments by making computations explicit, underscoring data flow, and emphasizing the participation of humans in the process when intuition and human reasoning are required. Workflows for experiments also highlight transitions among experimental phases, allowing intermediate results to be verified and supporting the proper handling of semantic mismatches and different file formats among the various tools used in the scientific process. Thus, scientific workflows are important for the modeling and subsequent capture of bioinformatics-related data. While much research has been conducted on the implementation of scientific workflows, the initial process of actually designing and generating the workflow at the conceptual level has received little consideration.
We propose a structured process to capture scientific workflows at the conceptual level that allows workflows to be documented efficiently, results in concise models of the workflow and more-correct workflow implementations, and provides insight into the scientific process itself. The approach uses three modeling techniques to model the structural, data flow, and control flow aspects of the workflow. The domain of biomolecular structure determination using Nuclear Magnetic Resonance spectroscopy is used to demonstrate the process. Specifically, we show the application of the approach to capture the workflow for the process of conducting biomolecular analysis using Nuclear Magnetic Resonance (NMR) spectroscopy.
Using the approach, we were able to accurately document, in a short amount of time, numerous steps in the process of conducting an experiment using NMR spectroscopy. The resulting models are correct and precise, as outside validation of the models identified only minor omissions in the models. In addition, the models provide an accurate visual description of the control flow for conducting biomolecular analysis using NMR spectroscopy experiment.
科学工作流通过使计算过程明确化、突出数据流以及在需要直觉和人类推理时强调人类在过程中的参与,改进了科学实验过程。实验工作流还突出了实验阶段之间的转换,允许对中间结果进行验证,并支持科学过程中使用的各种工具之间语义不匹配和不同文件格式的正确处理。因此,科学工作流对于生物信息学相关数据的建模和后续捕获非常重要。虽然在科学工作流的实现方面已经进行了大量研究,但在概念层面实际设计和生成工作流的初始过程却很少受到关注。
我们提出了一个在概念层面捕获科学工作流的结构化过程,该过程能够高效地记录工作流,生成简洁的工作流模型和更正确的工作流实现,并深入了解科学过程本身。该方法使用三种建模技术对工作流的结构、数据流和控制流方面进行建模。使用核磁共振光谱法确定生物分子结构的领域用于演示该过程。具体来说,我们展示了该方法在捕获使用核磁共振(NMR)光谱法进行生物分子分析过程的工作流中的应用。
使用该方法,我们能够在短时间内准确记录使用NMR光谱法进行实验过程中的众多步骤。生成的模型正确且精确,因为对模型的外部验证仅发现模型中有一些小的遗漏。此外,这些模型提供了使用NMR光谱法实验进行生物分子分析的控制流的准确可视化描述。