Sun Bo, Zhang Fei, Li Jing, Yang Yicheng, Diao Xiaolin, Zhao Wei, Shu Ting
Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 North Lishi Road, Xicheng District, Beijing, 100037, China.
Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China.
BMC Med Inform Decis Mak. 2021 Jun 26;21(1):199. doi: 10.1186/s12911-021-01554-2.
With the development and application of medical information system, semantic interoperability is essential for accurate and advanced health-related computing and electronic health record (EHR) information sharing. The openEHR approach can improve semantic interoperability. One key improvement of openEHR is that it allows for the use of existing archetypes. The crucial problem is how to improve the precision and resolve ambiguity in the archetype retrieval.
Based on the query expansion technology and Word2Vec model in Nature Language Processing (NLP), we propose to find synonyms as substitutes for original search terms in archetype retrieval. Test sets in different medical professional level are used to verify the feasibility.
Applying the approach to each original search term (n = 120) in test sets, a total of 69,348 substitutes were constructed. Precision at 5 (P@5) was improved by 0.767, on average. For the best result, the P@5 was up to 0.975.
We introduce a novel approach that using NLP technology and corpus to find synonyms as substitutes for original search terms. Compared to simply mapping the element contained in openEHR to an external dictionary, this approach could greatly improve precision and resolve ambiguity in retrieval tasks. This is helpful to promote the application of openEHR and advance EHR information sharing.
随着医学信息系统的发展与应用,语义互操作性对于准确且先进的健康相关计算以及电子健康记录(EHR)信息共享至关重要。开放EHR方法能够提升语义互操作性。开放EHR的一项关键改进在于它允许使用现有的原型。关键问题在于如何提高原型检索的精度并解决其中的歧义。
基于自然语言处理(NLP)中的查询扩展技术和Word2Vec模型,我们提议在原型检索中寻找同义词来替代原始搜索词。使用不同医学专业水平的测试集来验证其可行性。
将该方法应用于测试集中的每个原始搜索词(n = 120),共构建了69348个替代词。平均而言,前5项精度(P@5)提高了0.767。最佳结果时,P@5高达0.975。
我们引入了一种新颖的方法,即利用NLP技术和语料库来寻找同义词以替代原始搜索词。与简单地将开放EHR中包含的元素映射到外部词典相比,这种方法能够极大地提高检索任务的精度并解决歧义。这有助于推动开放EHR的应用并促进EHR信息共享。