Blekos Kostas, Chairetakis Kostas, Lynch Iseult, Marcoulaki Effie
Institute of Nuclear and Radiological Sciences and Technology, Energy and Safety, National Centre for Scientific Research "Demokritos", 15341, Agia Paraskevi, Greece.
School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK.
J Cheminform. 2023 Apr 12;15(1):44. doi: 10.1186/s13321-022-00669-6.
Efficient and machine-readable representations are needed to accurately identify, validate and communicate information of chemical structures. Many such representations have been developed (as, for example, the Simplified Molecular-Input Line-Entry System and the IUPAC International Chemical Identifier), each offering advantages specific to various use-cases. Representation of the multi-component structures of nanomaterials (NMs), though, remains out of scope for all the currently available standards, as the nature of NMs sets new challenges on formalizing the encoding of their structure, interactions and environmental parameters. In this work we identify a set of principles that a NM representation should adhere to in order to provide "machine-friendly" encodings of NMs, i.e. encodings that facilitate machine processing and cooperation with nanoinformatics tools. We illustrate our principles by showing how the recently introduced InChI-based NM representation, might be augmented, in principle, to also encode morphology and mixture properties, distributions of properties, and also to capture auxiliary information and allow data reuse.
需要高效且机器可读的表示形式来准确识别、验证和交流化学结构信息。已经开发了许多这样的表示形式(例如简化分子输入线性输入系统和IUPAC国际化学标识符),每种表示形式都具有适用于各种用例的特定优势。然而,纳米材料(NMs)的多组分结构表示仍超出所有当前可用标准的范围,因为纳米材料的性质给其结构、相互作用和环境参数的编码形式化带来了新的挑战。在这项工作中,我们确定了一组纳米材料表示应遵循的原则,以便提供纳米材料的“机器友好型”编码,即便于机器处理以及与纳米信息学工具协作的编码。我们通过展示最近引入的基于InChI的纳米材料表示原则上如何增强,以编码形态和混合性质、性质分布,还能捕获辅助信息并实现数据重用,来说明我们的原则。