• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

优化小数据集的词向量:以乳腺癌患者的患者门户消息为例的研究。

Optimizing word embeddings for small dataset: a case study on patient portal messages from breast cancer patients.

机构信息

Vanderbilt University, Nashville, TN, 37240, USA.

Brown University, Providence, RI, 02903, USA.

出版信息

Sci Rep. 2024 Jul 12;14(1):16117. doi: 10.1038/s41598-024-66319-z.

DOI:10.1038/s41598-024-66319-z
PMID:38997332
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11245534/
Abstract

Patient portal messages often relate to specific clinical phenomena (e.g., patients undergoing treatment for breast cancer) and, as a result, have received increasing attention in biomedical research. These messages require natural language processing and, while word embedding models, such as word2vec, have the potential to extract meaningful signals from text, they are not readily applicable to patient portal messages. This is because embedding models typically require millions of training samples to sufficiently represent semantics, while the volume of patient portal messages associated with a particular clinical phenomenon is often relatively small. We introduce a novel adaptation of the word2vec model, PK-word2vec (where PK stands for prior knowledge), for small-scale messages. PK-word2vec incorporates the most similar terms for medical words (including problems, treatments, and tests) and non-medical words from two pre-trained embedding models as prior knowledge to improve the training process. We applied PK-word2vec in a case study of patient portal messages in the Vanderbilt University Medical Center electric health record system sent by patients diagnosed with breast cancer from December 2004 to November 2017. We evaluated the model through a set of 1000 tasks, each of which compared the relevance of a given word to a group of the five most similar words generated by PK-word2vec and a group of the five most similar words generated by the standard word2vec model. We recruited 200 Amazon Mechanical Turk (AMT) workers and 7 medical students to perform the tasks. The dataset was composed of 1389 patient records and included 137,554 messages with 10,683 unique words. Prior knowledge was available for 7981 non-medical and 1116 medical words. In over 90% of the tasks, both reviewers indicated PK-word2vec generated more similar words than standard word2vec (p = 0.01).The difference in the evaluation by AMT workers versus medical students was negligible for all comparisons of tasks' choices between the two groups of reviewers ( under a paired t-test). PK-word2vec can effectively learn word representations from a small message corpus, marking a significant advancement in processing patient portal messages.

摘要

患者门户消息通常与特定的临床现象有关(例如,正在接受乳腺癌治疗的患者),因此在生物医学研究中受到越来越多的关注。这些消息需要自然语言处理,虽然词嵌入模型(如 word2vec)有可能从文本中提取有意义的信号,但它们不适用于患者门户消息。这是因为嵌入模型通常需要数百万个训练样本才能充分表示语义,而与特定临床现象相关的患者门户消息量通常相对较小。我们引入了一种 word2vec 模型的新颖适应方法,即 PK-word2vec(其中 PK 代表先验知识),用于小规模消息。PK-word2vec 将最相似的医疗词汇(包括问题、治疗和测试)和非医疗词汇与两个预先训练的嵌入模型中的词汇相结合,作为先验知识,以改进训练过程。我们在范德比尔特大学医学中心电子健康记录系统中对 2004 年 12 月至 2017 年 11 月期间被诊断患有乳腺癌的患者发送的患者门户消息进行了案例研究。我们通过一组 1000 个任务来评估该模型,每个任务都将给定单词的相关性与 PK-word2vec 生成的五个最相似单词组和标准 word2vec 模型生成的五个最相似单词组进行了比较。我们招募了 200 名亚马逊机械土耳其(AMT)工人和 7 名医学生来完成任务。数据集由 1389 份病历组成,包含 137554 条消息和 10683 个独特单词。有 7981 个非医疗和 1116 个医疗词汇可供先验知识使用。在超过 90%的任务中,两位审查员都表示 PK-word2vec 生成的单词比标准 word2vec 更相似(p=0.01)。在两个审查员组之间的任务选择比较中,AMT 工人和医学生的评估差异可以忽略不计(在配对 t 检验下)。PK-word2vec 可以有效地从小消息语料库中学习单词表示,这标志着处理患者门户消息的重大进展。

相似文献

1
Optimizing word embeddings for small dataset: a case study on patient portal messages from breast cancer patients.优化小数据集的词向量:以乳腺癌患者的患者门户消息为例的研究。
Sci Rep. 2024 Jul 12;14(1):16117. doi: 10.1038/s41598-024-66319-z.
2
Optimizing Word Embeddings for Patient Portal Message Datasets with a Small Number of Samples.针对少量样本的患者门户消息数据集优化词嵌入
Res Sq. 2024 May 15:rs.3.rs-4350387. doi: 10.21203/rs.3.rs-4350387/v1.
3
Classifying patient portal messages using Convolutional Neural Networks.使用卷积神经网络对患者门户消息进行分类。
J Biomed Inform. 2017 Oct;74:59-70. doi: 10.1016/j.jbi.2017.08.014. Epub 2017 Aug 30.
4
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
5
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。
PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.
6
Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study.用于对ICD-10-CM编码进行分类的混合采样训练投影词嵌入模型:纵向观察研究
JMIR Med Inform. 2019 Jul 23;7(3):e14499. doi: 10.2196/14499.
7
A comparison of rule-based and machine learning approaches for classifying patient portal messages.基于规则和机器学习方法在患者门户消息分类中的比较。
Int J Med Inform. 2017 Sep;105:110-120. doi: 10.1016/j.ijmedinf.2017.06.004. Epub 2017 Jun 23.
8
Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.利用生物医学和一般领域知识库评估神经词汇嵌入中的语义关系。
BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):65. doi: 10.1186/s12911-018-0630-x.
9
Optimizing Corpus Creation for Training Word Embedding in Low Resource Domains: A Case Study in Autism Spectrum Disorder (ASD).优化低资源领域中训练词嵌入的语料库创建:以自闭症谱系障碍(ASD)为例
AMIA Annu Symp Proc. 2018 Dec 5;2018:508-517. eCollection 2018.
10
Visualization of medical concepts represented using word embeddings: a scoping review.基于词向量表示的医学概念可视化:范围综述。
BMC Med Inform Decis Mak. 2022 Mar 29;22(1):83. doi: 10.1186/s12911-022-01822-9.

本文引用的文献

1
Development and Validation of a Useful Taxonomy of Patient Portals Based on Characteristics of Patient Engagement.基于患者参与特征的患者门户有用分类法的开发和验证。
Methods Inf Med. 2021 Jun;60(S 01):e44-e55. doi: 10.1055/s-0041-1730284. Epub 2021 Jul 9.
2
Patient Messaging Content Associated with Initiating Hormonal Therapy after a Breast Cancer Diagnosis.与乳腺癌诊断后开始激素治疗相关的患者信息内容
AMIA Annu Symp Proc. 2020 Mar 4;2019:962-971. eCollection 2019.
3
Why Patient Portal Messages Indicate Risk of Readmission for Patients with Ischemic Heart Disease.
为何患者门户网站消息提示缺血性心脏病患者再次入院风险
AMIA Annu Symp Proc. 2020 Mar 4;2019:828-837. eCollection 2019.
4
Patient Portals Facilitating Engagement With Inpatient Electronic Medical Records: A Systematic Review.促进住院电子病历参与度的患者门户:一项系统综述。
J Med Internet Res. 2019 Apr 11;21(4):e12779. doi: 10.2196/12779.
5
The therapy is making me sick: how online portal communications between breast cancer patients and physicians indicate medication discontinuation.治疗让我感到不适:乳腺癌患者与医生在线门户交流如何预示停药。
J Am Med Inform Assoc. 2018 Nov 1;25(11):1444-1451. doi: 10.1093/jamia/ocy118.
6
CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.CLAMP - 一个用于高效构建定制化临床自然语言处理管道的工具包。
J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336. doi: 10.1093/jamia/ocx132.
7
Classifying patient portal messages using Convolutional Neural Networks.使用卷积神经网络对患者门户消息进行分类。
J Biomed Inform. 2017 Oct;74:59-70. doi: 10.1016/j.jbi.2017.08.014. Epub 2017 Aug 30.
8
A Social Network Analysis of Cancer Provider Collaboration.癌症医疗服务提供者合作的社会网络分析
AMIA Annu Symp Proc. 2017 Feb 10;2016:1987-1996. eCollection 2016.
9
node2vec: Scalable Feature Learning for Networks.节点2向量:网络的可扩展特征学习
KDD. 2016 Aug;2016:855-864. doi: 10.1145/2939672.2939754.
10
Patient Portals: Who uses them? What features do they use? And do they reduce hospital readmissions?患者门户网站:谁在使用它们?他们使用哪些功能?它们能降低医院再入院率吗?
Appl Clin Inform. 2016 Jun 6;7(2):489-501. doi: 10.4338/ACI-2016-01-RA-0003. eCollection 2016.