School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China.
Beijing Advanced Innovation Center for Materials Genome Engineering, Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, China.
Sci Data. 2024 Jun 7;11(1):600. doi: 10.1038/s41597-024-03448-0.
A scalable, reusable, and broad-coverage unified material knowledge representation shows its importance and will bring great benefits to data sharing among materials communities. A knowledge graph (KG) for materials terminology, which is a formal collection of term entities and relationships, is conceptually important to achieve this goal. In this work, we propose a KG for materials terminology, named Materials Genome Engineering Database Knowledge Graph (MGED-KG), which is automatically constructed from text corpus via natural language processing. MGED-KG is the most comprehensive KG for materials terminology in both Chinese and English languages, consisting of 8,660 terms and their explanations. It encompasses 11 principal categories, such as Metals, Composites, Nanomaterials, each with two or three levels of subcategories, resulting in a total of 235 distinct category labels. For further application, a knowledge web system based on MGED-KG is developed and shows its great power in improving data sharing efficiency from the aspects of query expansion, term, and data recommendation.
一个可扩展的、可重复使用的、具有广泛覆盖范围的统一材料知识表示形式,显示了其重要性,并将为材料界的数据共享带来巨大的好处。术语知识图(KG)是术语实体和关系的正式集合,对于实现这一目标具有重要的概念意义。在这项工作中,我们提出了一个材料术语的 KG,命名为 Materials Genome Engineering Database Knowledge Graph(MGED-KG),它是通过自然语言处理从文本语料库自动构建的。MGED-KG 是中英文材料术语最全面的 KG,包含 8660 个术语及其解释。它涵盖了 11 个主要类别,如金属、复合材料、纳米材料,每个类别都有两个或三个层次的子类别,总共包含 235 个不同的类别标签。为了进一步应用,我们开发了一个基于 MGED-KG 的知识网络系统,并从查询扩展、术语和数据推荐等方面展示了其提高数据共享效率的强大功能。