Brown David M L, Cho Herman, de Jong Wibe A
Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352 USA.
Physical and Computational Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA 99352 USA.
J Cheminform. 2016 Feb 9;8:8. doi: 10.1186/s13321-016-0120-z. eCollection 2016.
The testing of theoretical models with experimental data is an integral part of the scientific method, and a logical place to search for new ways of stimulating scientific productivity. Often experiment/theory comparisons may be viewed as a workflow comprised of well-defined, rote operations distributed over several distinct computers, as exemplified by the way in which predictions from electronic structure theories are evaluated with results from spectroscopic experiments. For workflows such as this, which may be laborious and time consuming to perform manually, software that could orchestrate the operations and transfer results between computers in a seamless and automated fashion would offer major efficiency gains. Such tools also promise to alter how researchers interact with data outside their field of specialization by, e.g., making raw experimental results more accessible to theorists, and the outputs of theoretical calculations more readily comprehended by experimentalists.
An implementation of an automated workflow has been developed for the integrated analysis of data from nuclear magnetic resonance (NMR) experiments and electronic structure calculations. Kepler (Altintas et al. 2004) open source software was used to coordinate the processing and transfer of data at each step of the workflow. This workflow incorporated several open source software components, including electronic structure code to compute NMR parameters, a program to simulate NMR signals, NMR data processing programs, and others. The Kepler software was found to be sufficiently flexible to address several minor implementation challenges without recourse to other software solutions. The automated workflow was demonstrated with data from a [Formula: see text] NMR study of uranyl salts described previously (Cho et al. in J Chem Phys 132:084501, 2010).
The functional implementation of an automated process linking NMR data with electronic structure predictions demonstrates that modern software tools such as Kepler can be used to construct programs that comprehensively manage complex, multi-step scientific workflows spanning several different computers. Automation of the workflow can greatly accelerate the pace of discovery, and allows researchers to focus on the fundamental scientific questions rather than mastery of specialized software and data processing techniques. Future developments that would expand the scope and power of this approach include tools to standardize data and associated metadata formats, and the creation of interactive user interfaces to allow real-time exploration of the effects of program inputs on calculated outputs.
用实验数据检验理论模型是科学方法的一个重要组成部分,也是探索提高科学产出新方法的合理切入点。通常,实验/理论比较可被视为一个工作流程,由分布在几台不同计算机上的定义明确、机械的操作组成,例如用光谱实验结果评估电子结构理论预测的方式。对于这样的工作流程,手动执行可能既费力又耗时,能够编排操作并在计算机之间无缝、自动地传输结果的软件将大大提高效率。此类工具还有望改变研究人员与专业领域之外的数据的交互方式,例如让理论家更容易获取原始实验结果,让实验人员更容易理解理论计算的输出。
已开发出一种用于综合分析核磁共振(NMR)实验数据和电子结构计算数据的自动化工作流程。使用开普勒(阿尔廷塔斯等人,2004年)开源软件来协调工作流程每个步骤的数据处理和传输。此工作流程纳入了几个开源软件组件,包括用于计算NMR参数的电子结构代码、一个模拟NMR信号的程序、NMR数据处理程序等。发现开普勒软件足够灵活,无需借助其他软件解决方案就能应对几个小的实施挑战。用先前描述的铀酰盐的[化学式:见原文]NMR研究数据演示了该自动化工作流程(赵等人,《化学物理杂志》,第132卷,084501,2010年)。
将NMR数据与电子结构预测相联系的自动化过程的功能实现表明,像开普勒这样的现代软件工具可用于构建全面管理跨越几台不同计算机的复杂多步骤科学工作流程的程序。工作流程的自动化可大大加快发现速度,并使研究人员能够专注于基本科学问题,而不是掌握专门软件和数据处理技术。扩大此方法范围和能力的未来发展包括标准化数据和相关元数据格式的工具,以及创建交互式用户界面以实时探索程序输入对计算输出的影响。