Suppr超能文献

使用词嵌入识别关系短语之间的同义关系。

Identifying synonymy between relational phrases using word embeddings.

作者信息

Nguyen Nhung T H, Miwa Makoto, Tsuruoka Yoshimasa, Tojo Satoshi

机构信息

University of Science, Vietnam National University, Ho Chi Minh City, 227 Nguyen Van Cu St., Ward 4, Dist. 5, Ho Chi Minh City, Viet Nam; Japan Advanced Institute of Science and Technology, 1-8 Asahidai, Nomi-shi, Ishikawa 923-1292, Japan.

Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya 468-8511, Japan.

出版信息

J Biomed Inform. 2015 Aug;56:94-102. doi: 10.1016/j.jbi.2015.05.010. Epub 2015 May 22.

Abstract

Many text mining applications in the biomedical domain benefit from automatic clustering of relational phrases into synonymous groups, since it alleviates the problem of spurious mismatches caused by the diversity of natural language expressions. Most of the previous work that has addressed this task of synonymy resolution uses similarity metrics between relational phrases based on textual strings or dependency paths, which, for the most part, ignore the context around the relations. To overcome this shortcoming, we employ a word embedding technique to encode relational phrases. We then apply the k-means algorithm on top of the distributional representations to cluster the phrases. Our experimental results show that this approach outperforms state-of-the-art statistical models including latent Dirichlet allocation and Markov logic networks.

摘要

生物医学领域的许多文本挖掘应用都受益于将关系短语自动聚类为同义组,因为这缓解了由自然语言表达的多样性所导致的虚假不匹配问题。之前处理同义性解析这项任务的大多数工作都使用基于文本字符串或依存路径的关系短语之间的相似性度量,而这些度量在很大程度上忽略了关系周围的上下文。为了克服这一缺点,我们采用词嵌入技术对关系短语进行编码。然后,我们在分布式表示之上应用k均值算法对短语进行聚类。我们的实验结果表明,这种方法优于包括潜在狄利克雷分配和马尔可夫逻辑网络在内的现有统计模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验