Institute for Medical Informatics, Bern University of Applied Sciences, Bern, Switzerland.
Stud Health Technol Inform. 2022 Jun 6;290:602-606. doi: 10.3233/SHTI220148.
In the medical domain, multiple ontologies and terminology systems are available. However, existing classification and prediction algorithms in the clinical domain often ignore or insufficiently utilize semantic information as it is provided in those ontologies. To address this issue, we introduce a concept for augmenting embeddings, the input to deep neural networks, with semantic information retrieved from ontologies. To do this, words and phrases of sentences are mapped to concepts of a medical ontology aggregating synonyms in the same concept. A semantically enriched vector is generated and used for sentence classification. We study our approach on a sentence classification task using a real world dataset which comprises 640 sentences belonging to 22 categories. A deep neural network model is defined with an embedding layer followed by two LSTM layers and two dense layers. Our experiments show, classification accuracy without content enriched embeddings is for some categories higher than without enrichment. We conclude that semantic information from ontologies has potential to provide a useful enrichment of text. Future research will assess to what extent semantic relationships from the ontology can be used for enrichment.
在医学领域,有多种本体和术语系统可用。然而,临床领域现有的分类和预测算法通常忽略或未能充分利用这些本体中提供的语义信息。为了解决这个问题,我们引入了一个概念,即将来自本体的语义信息添加到深度学习网络的输入中。为此,句子中的单词和短语被映射到一个医学本体的概念上,这些概念汇总了同一概念中的同义词。生成一个语义丰富的向量,并用于句子分类。我们在一个句子分类任务上研究了我们的方法,该任务使用了一个真实数据集,其中包含 640 个属于 22 个类别的句子。定义了一个带有嵌入层的深度神经网络模型,后面跟着两个 LSTM 层和两个密集层。我们的实验表明,对于某些类别,没有内容丰富的嵌入的分类准确性高于没有丰富性的分类准确性。我们得出结论,来自本体的语义信息有可能为文本提供有用的丰富性。未来的研究将评估本体的语义关系在多大程度上可以用于丰富性。