Suppr超能文献

描述相同主题不同表达的临床可比语料库。

Clinical Comparable Corpus Describing the Same Subjects with Different Expressions.

机构信息

Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, Japan.

The Department of Radiology, The University of Tokyo Hospital, Bunkyo, Tokyo, Japan.

出版信息

Stud Health Technol Inform. 2022 Jun 6;290:253-257. doi: 10.3233/SHTI220073.

Abstract

Medical artificial intelligence (AI) systems need to learn to recognize synonyms or paraphrases describing the same anatomy, disease, treatment, etc. to better understand real-world clinical documents. Existing linguistic resources focus on variants at the word or sentence level. To handle linguistic variations on a broader scale, we proposed the Medical Text Radiology Report section Japanese version (MedTxt-RR-JA), the first clinical comparable corpus. MedTxt-RR-JA was built by recruiting nine radiologists to diagnose the same 15 lung cancer cases in Radiopaedia, an open-access radiological repository. The 135 radiology reports in MedTxt-RR-JA were shown to contain word-, sentence- and document-level variations maintaining similarity of contents. MedTxt-RR-JA is also the first publicly available Japanese radiology report corpus that would help to overcome poor data availability for Japanese medical AI systems. Moreover, our methodology can be applied widely to building clinical corpora without privacy concerns.

摘要

医疗人工智能(AI)系统需要学习识别描述相同解剖结构、疾病、治疗等的同义词或释义,以更好地理解真实世界的临床文档。现有的语言资源侧重于单词或句子级别的变体。为了更广泛地处理语言变化,我们提出了第一个临床可比语料库——医疗文本放射报告部分日语版(MedTxt-RR-JA)。MedTxt-RR-JA 通过招募九名放射科医生在开放获取的放射学知识库 Radiopaedia 中对相同的 15 例肺癌病例进行诊断而建立。MedTxt-RR-JA 中的 135 份放射学报告包含了单词、句子和文档级别的变化,但内容相似。MedTxt-RR-JA 也是第一个公开的日语放射学报告语料库,将有助于克服日本医疗 AI 系统数据可用性差的问题。此外,我们的方法可以广泛应用于构建没有隐私问题的临床语料库。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验