Department of Medicine (Biomedical Informatics), Stanford University, Stanford, CA 94305, United States.
Clinical Artificial Intelligence Implementation and Research Lab (CAIRELab), Leiden University Medical Center, Leiden 2333ZN, The Netherlands.
J Am Med Inform Assoc. 2024 Oct 1;31(10):2255-2262. doi: 10.1093/jamia/ocae188.
This study aims to explore and develop tools for early identification of depression concerns among cancer patients by leveraging the novel data source of messages sent through a secure patient portal.
We developed classifiers based on logistic regression (LR), support vector machines (SVMs), and 2 Bidirectional Encoder Representations from Transformers (BERT) models (original and Reddit-pretrained) on 6600 patient messages from a cancer center (2009-2022), annotated by a panel of healthcare professionals. Performance was compared using AUROC scores, and model fairness and explainability were examined. We also examined correlations between model predictions and depression diagnosis and treatment.
BERT and RedditBERT attained AUROC scores of 0.88 and 0.86, respectively, compared to 0.79 for LR and 0.83 for SVM. BERT showed bigger differences in performance across sex, race, and ethnicity than RedditBERT. Patients who sent messages classified as concerning had a higher chance of receiving a depression diagnosis, a prescription for antidepressants, or a referral to the psycho-oncologist. Explanations from BERT and RedditBERT differed, with no clear preference from annotators.
We show the potential of BERT and RedditBERT in identifying depression concerns in messages from cancer patients. Performance disparities across demographic groups highlight the need for careful consideration of potential biases. Further research is needed to address biases, evaluate real-world impacts, and ensure responsible integration into clinical settings.
This work represents a significant methodological advancement in the early identification of depression concerns among cancer patients. Our work contributes to a route to reduce clinical burden while enhancing overall patient care, leveraging BERT-based models.
本研究旨在通过利用癌症患者通过安全患者门户发送的消息这一新颖数据源,探索和开发用于早期识别癌症患者抑郁问题的工具。
我们在一个癌症中心(2009-2022 年)的 6600 条患者消息上,基于逻辑回归(LR)、支持向量机(SVM)和 2 个基于转换器的双向编码器表示(BERT)模型(原始和 Reddit 预训练)开发了分类器,这些消息由一组医疗保健专业人员进行了注释。使用 AUROC 分数比较了性能,检查了模型的公平性和可解释性。我们还检查了模型预测与抑郁诊断和治疗之间的相关性。
BERT 和 RedditBERT 的 AUROC 得分分别为 0.88 和 0.86,而 LR 的为 0.79,SVM 的为 0.83。与 RedditBERT 相比,BERT 在性别、种族和民族方面的性能差异更大。被归类为有问题的消息的患者接受抑郁诊断、抗抑郁药处方或转介给心理肿瘤学家的可能性更高。BERT 和 RedditBERT 的解释不同,注释者没有明显的偏好。
我们展示了 BERT 和 RedditBERT 在识别癌症患者消息中的抑郁问题方面的潜力。在人口统计学群体之间的性能差异突出了需要仔细考虑潜在偏见的必要性。需要进一步研究以解决偏见、评估实际影响并确保负责任地将其整合到临床环境中。
这项工作代表了在早期识别癌症患者抑郁问题方面的重大方法学进展。我们的工作通过使用基于 BERT 的模型为减轻临床负担同时提高整体患者护理做出了贡献。