利用自然语言处理技术预测甲状腺癌患者与健康相关的生活质量变化

Predicting health-related quality of life change using natural language processing in thyroid cancer.

作者信息

Lian Ruixue, Hsiao Vivian, Hwang Juwon, Ou Yue, Robbins Sarah E, Connor Nadine P, Macdonald Cameron L, Sippel Rebecca S, Sethares William A, Schneider David F

机构信息

University of Wisconsin, Madison, USA.

University of Wisconsin, Madison Department of Electrical and Computer Engineering, USA.

出版信息

Intell Based Med. 2023;7. doi: 10.1016/j.ibmed.2023.100097. Epub 2023 Mar 15.

DOI:10.1016/j.ibmed.2023.100097

PMID:37664403

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10473865/

Abstract

BACKGROUND

Patient-reported outcomes (PRO) allow clinicians to measure health-related quality of life (HRQOL) and understand patients' treatment priorities, but obtaining PRO requires surveys which are not part of routine care. We aimed to develop a preliminary natural language processing (NLP) pipeline to extract HRQOL trajectory based on deep learning models using patient language.

MATERIALS AND METHODS

Our data consisted of transcribed interviews of 100 patients undergoing surgical intervention for low-risk thyroid cancer, paired with HRQOL assessments completed during the same visits. Our outcome measure was HRQOL trajectory measured by the SF-12 physical and mental component scores (PCS and MCS), and average THYCA-QoL score.We constructed an NLP pipeline based on BERT, a modern deep language model that captures context semantics, to predict HRQOL trajectory as measured by the above endpoints. We compared this to baseline models using logistic regression and support vector machines trained on bag-of-words representations of transcripts obtained using Linguistic Inquiry and Word Count (LIWC). Finally, given the modest dataset size, we implemented two data augmentation methods to improve performance: first by generating synthetic samples via GPT-2, and second by changing the representation of available data via sequence-by-sequence pairing, which is a novel approach.

RESULTS

A BERT-based deep learning model, with GPT-2 synthetic sample augmentation, demonstrated an area-under-curve of 76.3% in the classification of HRQOL accuracy as measured by PCS, compared to the baseline logistic regression and bag-of-words model, which had an AUC of 59.9%. The sequence-by-sequence pairing method for augmentation had an AUC of 71.2% when used with the BERT model.

CONCLUSIONS

NLP methods show promise in extracting PRO from unstructured narrative data, and in the future may aid in assessing and forecasting patients' HRQOL in response to medical treatments. Our experiments with optimization methods suggest larger amounts of novel data would further improve performance of the classification model.

摘要

背景

患者报告结局（PRO）使临床医生能够衡量健康相关生活质量（HRQOL）并了解患者的治疗优先级，但获取PRO需要进行并非常规护理一部分的调查。我们旨在开发一个初步的自然语言处理（NLP）管道，以基于深度学习模型使用患者语言提取HRQOL轨迹。

材料与方法

我们的数据包括对100例接受低风险甲状腺癌手术干预患者的访谈记录，以及在同一次就诊期间完成的HRQOL评估。我们的结局指标是通过SF-12身体和心理成分得分（PCS和MCS）以及平均THYCA-QoL得分衡量的HRQOL轨迹。我们基于BERT构建了一个NLP管道，BERT是一种捕捉上下文语义的现代深度语言模型，用于预测由上述终点衡量的HRQOL轨迹。我们将其与使用逻辑回归和支持向量机的基线模型进行比较，这些基线模型是在使用语言查询和字数统计（LIWC）获得的转录本的词袋表示上进行训练的。最后，鉴于数据集规模较小，我们实施了两种数据增强方法来提高性能：第一种是通过GPT-2生成合成样本，第二种是通过逐序列配对改变可用数据的表示，这是一种新颖的方法。

结果

与基线逻辑回归和词袋模型相比，基于BERT的深度学习模型在通过PCS衡量的HRQOL准确性分类中，经GPT-2合成样本增强后，曲线下面积为76.3%，而基线逻辑回归和词袋模型的AUC为59.9%。当与BERT模型一起使用时，逐序列配对增强方法的AUC为71.2%。

结论

NLP方法在从非结构化叙述数据中提取PRO方面显示出前景，并且未来可能有助于评估和预测患者对医疗治疗的HRQOL。我们的优化方法实验表明，大量的新数据将进一步提高分类模型的性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用自然语言处理技术预测甲状腺癌患者与健康相关的生活质量变化

Predicting health-related quality of life change using natural language processing in thyroid cancer.

作者信息

机构信息

出版信息

BACKGROUND

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

背景

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

利用自然语言处理技术预测甲状腺癌患者与健康相关的生活质量变化

Predicting health-related quality of life change using natural language processing in thyroid cancer.

作者信息

机构信息

出版信息

BACKGROUND

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

背景

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献