Suppr超能文献

用于识别统一医学语言系统语义关系的半监督学习

Semi-Supervised Learning to Identify UMLS Semantic Relations.

作者信息

Luo Yuan, Uzuner Ozlem

机构信息

Massachueets Institute of Technology.

Massachueets Institute of Technology ; State University of New York at Albany.

出版信息

AMIA Jt Summits Transl Sci Proc. 2014 Apr 7;2014:67-75. eCollection 2014.

Abstract

The UMLS Semantic Network is constructed by experts and requires periodic expert review to update. We propose and implement a semi-supervised approach for automatically identifying UMLS semantic relations from narrative text in PubMed. Our method analyzes biomedical narrative text to collect semantic entity pairs, and extracts multiple semantic, syntactic and orthographic features for the collected pairs. We experiment with seeded k-means clustering with various distance metrics. We create and annotate a ground truth corpus according to the top two levels of the UMLS semantic relation hierarchy. We evaluate our system on this corpus and characterize the learning curves of different clustering configuration. Using KL divergence consistently performs the best on the held-out test data. With full seeding, we obtain macro-averaged F-measures above 70% for clustering the top level UMLS relations (2-way), and above 50% for clustering the second level relations (7-way).

摘要

统一医学语言系统(UMLS)语义网络由专家构建,需要定期进行专家审查以更新。我们提出并实施了一种半监督方法,用于从PubMed中的叙述文本中自动识别UMLS语义关系。我们的方法分析生物医学叙述文本以收集语义实体对,并为收集到的实体对提取多种语义、句法和正字法特征。我们使用各种距离度量对种子k均值聚类进行实验。我们根据UMLS语义关系层次结构的前两个级别创建并注释了一个真值语料库。我们在这个语料库上评估我们的系统,并刻画不同聚类配置的学习曲线。使用KL散度在留出的测试数据上始终表现最佳。在完全播种的情况下,对于顶级UMLS关系(二元)聚类,我们获得的宏平均F值超过70%,对于二级关系(七元)聚类,该值超过50%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/27f1/4419772/577bfafa9c42/1861093f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验