Suppr超能文献

在流形子空间中细化电子病历表示。

Refining electronic medical records representation in manifold subspace.

机构信息

College of Computer Science and Technology, Dalian University of Technology, Dalian, China.

出版信息

BMC Bioinformatics. 2022 Apr 1;23(1):115. doi: 10.1186/s12859-022-04653-7.

Abstract

BACKGROUND

Electronic medical records (EMR) contain detailed information about patient health. Developing an effective representation model is of great significance for the downstream applications of EMR. However, processing data directly is difficult because EMR data has such characteristics as incompleteness, unstructure and redundancy. Therefore, preprocess of the original data is the key step of EMR data mining. The classic distributed word representations ignore the geometric feature of the word vectors for the representation of EMR data, which often underestimate the similarities between similar words and overestimate the similarities between distant words. This results in word similarity obtained from embedding models being inconsistent with human judgment and much valuable medical information being lost.

RESULTS

In this study, we propose a biomedical word embedding framework based on manifold subspace. Our proposed model first obtains the word vector representations of the EMR data, and then re-embeds the word vector in the manifold subspace. We develop an efficient optimization algorithm with neighborhood preserving embedding based on manifold optimization. To verify the algorithm presented in this study, we perform experiments on intrinsic evaluation and external classification tasks, and the experimental results demonstrate its advantages over other baseline methods.

CONCLUSIONS

Manifold learning subspace embedding can enhance the representation of distributed word representations in electronic medical record texts. Reduce the difficulty for researchers to process unstructured electronic medical record text data, which has certain biomedical research value.

摘要

背景

电子病历(EMR)包含有关患者健康的详细信息。开发有效的表示模型对于 EMR 的下游应用具有重要意义。然而,由于 EMR 数据具有不完整性、非结构性和冗余性等特点,直接处理数据具有一定的难度。因此,原始数据的预处理是 EMR 数据挖掘的关键步骤。经典的分布式词表示忽略了词向量的几何特征,用于表示 EMR 数据,这往往低估了相似词之间的相似度,高估了远义词之间的相似度。这导致从嵌入模型中获得的词相似度与人类判断不一致,并且丢失了大量有价值的医学信息。

结果

在本研究中,我们提出了一种基于流形子空间的生物医学词嵌入框架。我们提出的模型首先获得 EMR 数据的词向量表示,然后在流形子空间中重新嵌入词向量。我们开发了一种基于流形优化的基于邻域保持嵌入的高效优化算法。为了验证本研究中提出的算法,我们在内在评估和外部分类任务上进行了实验,实验结果表明其优于其他基线方法。

结论

流形学习子空间嵌入可以增强分布式词表示在电子病历文本中的表示。降低研究人员处理非结构化电子病历文本数据的难度,具有一定的生物医学研究价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a504/8973530/b4a83f8c41ac/12859_2022_4653_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验