ITMO University, Saint Petersburg, Russia.
Federal Almazov North-west Medical Research Centre, Saint Petersburg, Russia.
Stud Health Technol Inform. 2021 Oct 27;285:94-99. doi: 10.3233/SHTI210579.
Electronic Medical Records (EMR) contain a lot of valuable data about patients, which is however unstructured. There is a lack of labeled medical text data in Russian and there are no tools for automatic annotation. We present an unsupervised approach to medical data annotation. Morphological and syntactical analyses of initial sentences produce syntactic trees, from which similar subtrees are then grouped by Word2Vec and labeled using dictionaries and Wikidata categories. This method can be used to automatically label EMRs in Russian and proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabularies.
电子病历(EMR)包含大量有关患者的有价值数据,但这些数据是非结构化的。俄语中缺乏带标签的医学文本数据,也没有自动标注工具。我们提出了一种针对医学数据标注的无监督方法。对初始句子进行形态和句法分析会生成句法树,然后使用 Word2Vec 对相似的子树进行分组,并使用字典和 Wikidata 类别进行标记。这种方法可用于自动标注俄语 EMR,所提出的方法也可应用于其他语言,这些语言缺乏自动标注资源和领域词汇。