Ennajari Hafsa, Bouguila Nizar, Bentahar Jamal
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3609-3623. doi: 10.1109/TNNLS.2021.3112045. Epub 2023 Jul 6.
Probabilistic topic models are considered as an effective framework for text analysis that uncovers the main topics in an unlabeled set of documents. However, the inferred topics by traditional topic models are often unclear and not easy to interpret because they do not account for semantic structures in language. Recently, a number of topic modeling approaches tend to leverage domain knowledge to enhance the quality of the learned topics, but they still assume a multinomial or Gaussian document likelihood in the Euclidean space, which often results in information loss and poor performance. In this article, we propose a Bayesian embedded spherical topic model (ESTM) that combines both knowledge graph and word embeddings in a non-Euclidean curved space, the hypersphere, for better topic interpretability and discriminative text representations. Extensive experimental results show that our proposed model successfully uncovers interpretable topics and learns high-quality text representations useful for common natural language processing (NLP) tasks across multiple benchmark datasets.
概率主题模型被视为文本分析的有效框架,它能揭示一组未标记文档中的主要主题。然而,传统主题模型推断出的主题往往不清晰且难以解释,因为它们没有考虑语言中的语义结构。最近,许多主题建模方法倾向于利用领域知识来提高所学习主题的质量,但它们仍然在欧几里得空间中假设多项式或高斯文档似然性,这常常导致信息丢失和性能不佳。在本文中,我们提出了一种贝叶斯嵌入球面主题模型(ESTM),该模型在非欧几里得弯曲空间(超球面)中结合了知识图谱和词嵌入,以实现更好的主题可解释性和判别性文本表示。大量实验结果表明,我们提出的模型成功地揭示了可解释的主题,并学习到了对多个基准数据集上的常见自然语言处理(NLP)任务有用的高质量文本表示。