Suppr超能文献

使用 BERT 及其通过屏蔽语言建模进行领域自适应优化来提取患者叙述中多个乳腺癌患者关注点的方法研究。

Exploring a method for extracting concerns of multiple breast cancer patients in the domain of patient narratives using BERT and its optimization by domain adaptation using masked language modeling.

机构信息

Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan.

Nara Institute of Science and Technology, Nara, Japan.

出版信息

PLoS One. 2024 Sep 6;19(9):e0305496. doi: 10.1371/journal.pone.0305496. eCollection 2024.

Abstract

Narratives posted on the internet by patients contain a vast amount of information about various concerns. This study aimed to extract multiple concerns from interviews with breast cancer patients using the natural language processing (NLP) model bidirectional encoder representations from transformers (BERT). A total of 508 interview transcriptions of breast cancer patients written in Japanese were labeled with five types of concern labels: "treatment," "physical," "psychological," "work/financial," and "family/friends." The labeled texts were used to create a multi-label classifier by fine-tuning a pre-trained BERT model. Prior to fine-tuning, we also created several classifiers with domain adaptation using (1) breast cancer patients' blog articles and (2) breast cancer patients' interview transcriptions. The performance of the classifiers was evaluated in terms of precision through 5-fold cross-validation. The multi-label classifiers with only fine-tuning had precision values of over 0.80 for "physical" and "work/financial" out of the five concerns. On the other hand, precision for "treatment" was low at approximately 0.25. However, for the classifiers using domain adaptation, the precision of this label took a range of 0.40-0.51, with some cases improving by more than 0.2. This study showed combining domain adaptation with a multi-label classifier on target data made it possible to efficiently extract multiple concerns from interviews.

摘要

患者在互联网上发布的叙事包含大量关于各种关注点的信息。本研究旨在使用来自变压器的双向编码器表示(BERT)的自然语言处理(NLP)模型从乳腺癌患者的访谈中提取多个关注点。共对 508 份用日语书写的乳腺癌患者访谈转录进行了标记,标记为五种类型的关注点标签:“治疗”、“身体”、“心理”、“工作/财务”和“家庭/朋友”。使用预训练的 BERT 模型进行微调来创建多标签分类器。在微调之前,我们还使用(1)乳腺癌患者的博客文章和(2)乳腺癌患者的访谈转录,使用领域自适应创建了几个分类器。通过 5 倍交叉验证,根据精度评估分类器的性能。在五个关注点中,仅通过微调的多标签分类器对于“身体”和“工作/财务”的精度值超过 0.80。另一方面,“治疗”的精度约为 0.25。然而,对于使用领域自适应的分类器,该标签的精度范围为 0.40-0.51,有些情况提高了 0.2 以上。这项研究表明,将领域自适应与目标数据上的多标签分类器结合使用,可以有效地从访谈中提取多个关注点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d58/11379386/da0fa1b8d209/pone.0305496.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验