EMSL, Pacific Northwest National Laboratory, P,O, Box 999, Richland, WA 99352, USA.
J Cheminform. 2013 May 24;5(1):25. doi: 10.1186/1758-2946-5-25.
Multidisciplinary integrated research requires the ability to couple the diverse sets of data obtained from a range of complex experiments and computer simulations. Integrating data requires semantically rich information. In this paper an end-to-end use of semantically rich data in computational chemistry is demonstrated utilizing the Chemical Markup Language (CML) framework. Semantically rich data is generated by the NWChem computational chemistry software with the FoX library and utilized by the Avogadro molecular editor for analysis and visualization.
The NWChem computational chemistry software has been modified and coupled to the FoX library to write CML compliant XML data files. The FoX library was expanded to represent the lexical input files and molecular orbitals used by the computational chemistry software. Draft dictionary entries and a format for molecular orbitals within CML CompChem were developed. The Avogadro application was extended to read in CML data, and display molecular geometry and electronic structure in the GUI allowing for an end-to-end solution where Avogadro can create input structures, generate input files, NWChem can run the calculation and Avogadro can then read in and analyse the CML output produced. The developments outlined in this paper will be made available in future releases of NWChem, FoX, and Avogadro.
The production of CML compliant XML files for computational chemistry software such as NWChem can be accomplished relatively easily using the FoX library. The CML data can be read in by a newly developed reader in Avogadro and analysed or visualized in various ways. A community-based effort is needed to further develop the CML CompChem convention and dictionary. This will enable the long-term goal of allowing a researcher to run simple "Google-style" searches of chemistry and physics and have the results of computational calculations returned in a comprehensible form alongside articles from the published literature.
多学科综合研究需要能够将来自一系列复杂实验和计算机模拟的不同数据集结合起来。集成数据需要语义丰富的信息。本文展示了在计算化学中使用语义丰富的数据的端到端方法,利用化学标记语言(CML)框架。语义丰富的数据由 NWChem 计算化学软件与 FoX 库生成,并由 Avogadro 分子编辑器用于分析和可视化。
对 NWChem 计算化学软件进行了修改,并与 FoX 库耦合,以编写符合 CML 的 XML 数据文件。扩展了 FoX 库以表示计算化学软件使用的词汇输入文件和分子轨道。制定了 CML CompChem 中分子轨道的字典条目草案和格式。扩展了 Avogadro 应用程序以读取 CML 数据,并在 GUI 中显示分子几何形状和电子结构,从而实现了端到端的解决方案,其中 Avogadro 可以创建输入结构,生成输入文件,NWChem 可以运行计算,然后 Avogadro 可以读取和分析生成的 CML 输出。本文中概述的开发将在未来的 NWChem、FoX 和 Avogadro 版本中提供。
使用 FoX 库可以相对轻松地为 NWChem 等计算化学软件生成符合 CML 的 XML 文件。可以在新开发的 Avogadro 读取器中读取 CML 数据,并以各种方式进行分析或可视化。需要进行基于社区的努力来进一步开发 CML CompChem 约定和字典。这将使研究人员能够实现长期目标,即能够运行简单的“Google 式”化学和物理搜索,并将计算结果以可理解的形式与已发表文献中的文章一起返回。