Manchev Yulian T, Burn Matthew J, Popelier Paul L A
Department of Chemistry, The University of Manchester, Manchester, UK.
J Comput Chem. 2024 Dec 15;45(32):2912-2928. doi: 10.1002/jcc.27477. Epub 2024 Aug 31.
We present ichor, an open-source Python library that simplifies data management in computational chemistry and streamlines machine learning force field development. Ichor implements many easily extensible file management tools, in addition to a lazy file reading system, allowing efficient management of hundreds of thousands of computational chemistry files. Data from calculations can be readily stored into databases for easy sharing and post-processing. Raw data can be directly processed by ichor to create machine learning-ready datasets. In addition to powerful data-related capabilities, ichor provides interfaces to popular workload management software employed by High Performance Computing clusters, making for effortless submission of thousands of separate calculations with only a single line of Python code. Furthermore, a simple-to-use command line interface has been implemented through a series of menu systems to further increase accessibility and efficiency of common important ichor tasks. Finally, ichor implements general tools for visualization and analysis of datasets and tools for measuring machine-learning model quality both on test set data and in simulations. With the current functionalities, ichor can serve as an end-to-end data procurement, data management, and analysis solution for machine-learning force-field development.
我们展示了Ichor,一个开源的Python库,它简化了计算化学中的数据管理,并简化了机器学习力场开发。Ichor除了实现一个惰性文件读取系统外,还实现了许多易于扩展的文件管理工具,从而能够高效管理数十万份计算化学文件。计算数据可以轻松存储到数据库中,便于共享和后期处理。原始数据可以直接由Ichor处理,以创建适用于机器学习的数据集。除了强大的数据相关功能外,Ichor还提供了与高性能计算集群使用的流行工作负载管理软件的接口,只需一行Python代码就能轻松提交数千个单独的计算任务。此外,通过一系列菜单系统实现了一个易于使用的命令行界面,以进一步提高Ichor常见重要任务的可访问性和效率。最后,Ichor实现了用于数据集可视化和分析的通用工具,以及用于在测试集数据和模拟中测量机器学习模型质量的工具。凭借当前的功能,Ichor可以作为机器学习力场开发的端到端数据获取、数据管理和分析解决方案。