用于扩展本体的盒嵌入：一种数据驱动且可解释的方法。

Box embeddings for extending ontologies: a data-driven and interpretable approach.

作者信息

Memariani Adel, Glauer Martin, Flügel Simon, Neuhaus Fabian, Hastings Janna, Mossakowski Till

机构信息

Data Science Group (DICE), Heinz Nixdorf Institute, Paderborn University, Warburger Str. 100, 33098, Paderborn, North Rhine-Westphalia, Germany.

Institute for Intelligent Cooperating Systems, Otto von Guericke University, Universitätsplatz 2, 39106, Magdeburg, Saxony-Anhalt, Germany.

出版信息

J Cheminform. 2025 Sep 1;17(1):138. doi: 10.1186/s13321-025-01086-1.

DOI:10.1186/s13321-025-01086-1

PMID:40890838

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12403937/

Abstract

Deriving symbolic knowledge from trained deep learning models is challenging due to the lack of transparency in such models. A promising approach to address this issue is to couple a semantic structure with the model outputs and thereby make the model interpretable. In prediction tasks such as multi-label classification, labels tend to form hierarchical relationships. Therefore, we propose enforcing a taxonomical structure on the model's outputs throughout the training phase. In vector space, a taxonomy can be represented using axis-aligned hyper-rectangles, or boxes, which may overlap or nest within one another. The boundaries of a box determine the extent of a particular category. Thus, we used box-shaped embeddings of ontology classes to learn and transparently represent logical relationships that are only implicit in multi-label datasets. We assessed our model by measuring its ability to approximate the full set of inferred subclass relations in the ChEBI ontology, which is an important knowledge base in the field of life science. We demonstrate that our model captures implicit hierarchical relationships among labels, ensuring consistency with the underlying ontological conceptualization, while also achieving state-of-the-art performance in multi-label classification. Notably, this is accomplished without requiring an explicit taxonomy during the training process. SCIENTIFIC CONTRIBUTION: Our proposed approach advances chemical classification by enabling interpretable outputs through a structured and geometrically expressive representation of molecules and their classes.

摘要

由于深度学习模型缺乏透明度，从训练好的深度学习模型中获取符号知识具有挑战性。解决这个问题的一个有前景的方法是将语义结构与模型输出相结合，从而使模型具有可解释性。在多标签分类等预测任务中，标签往往会形成层次关系。因此，我们建议在整个训练阶段对模型的输出强制实施一种分类结构。在向量空间中，分类法可以用轴对齐的超矩形或盒子来表示，这些超矩形或盒子可能相互重叠或嵌套。盒子的边界决定了特定类别的范围。因此，我们使用本体类别的盒状嵌入来学习并透明地表示多标签数据集中仅隐含的逻辑关系。我们通过测量模型近似ChEBI本体中完整推断子类关系集的能力来评估我们的模型，ChEBI本体是生命科学领域的一个重要知识库。我们证明，我们的模型捕捉了标签之间隐含的层次关系，确保与基础本体概念化一致，同时在多标签分类中也实现了最先进的性能。值得注意的是，这是在训练过程中不需要明确分类法的情况下完成的。科学贡献：我们提出的方法通过对分子及其类别进行结构化和几何表达的表示来实现可解释的输出，从而推动了化学分类。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于扩展本体的盒嵌入：一种数据驱动且可解释的方法。

Box embeddings for extending ontologies: a data-driven and interpretable approach.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

用于扩展本体的盒嵌入：一种数据驱动且可解释的方法。

Box embeddings for extending ontologies: a data-driven and interpretable approach.

作者信息

机构信息

出版信息

相似文献

本文引用的文献