Vaitkus Antanas, Merkys Andrius, Gražulis Saulius
Department of Protein-DNA Interactions, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio al. 7, LT-10257, Vilnius, Lithuania.
Faculty of Mathematics and Informatics, Vilnius University, Naugarduko g. 24, LT-03225, Vilnius, Lithuania.
J Appl Crystallogr. 2021 Feb 14;54(Pt 2):661-672. doi: 10.1107/S1600576720016532. eCollection 2021 Apr 1.
Data curation practices of the Crystallography Open Database (COD) are described with additional focus being placed on the formal validation using the Crystallographic Information Framework (CIF). The program, capable of validating CIF files against both the DDL1 and the DDLm dictionaries, is presented and used to process the entirety of the COD. Validation results collected from over 450 000 CIF files are demonstrated to be a useful resource in the data maintenance process as well as the development of the underlying ontologies. A set of programs intended to aid in the dictionary migration from DDL1 to DDLm is also presented.
本文描述了晶体学开放数据库(COD)的数据管理实践,并特别关注使用晶体学信息框架(CIF)进行的形式验证。介绍了一个能够根据DDL1和DDLm字典对CIF文件进行验证的程序,并使用该程序处理了整个COD。从超过450,000个CIF文件收集的验证结果被证明是数据维护过程以及基础本体开发中的有用资源。还介绍了一组旨在帮助从DDL1迁移到DDLm字典的程序。