Miranda Oshin, Kiehl Sophie, Qi Xiguang, Ryan Neal David, Kirisci Levent, Brannock M Daniel, Kosten Thomas, Wang Yanshan, Wang LiRong
University of Pittsburgh.
Colorado State University.
Res Sq. 2024 Feb 21:rs.3.rs-3973337. doi: 10.21203/rs.3.rs-3973337/v1.
Extracting research of domain criteria (RDoC) from high-risk populations like those with post-traumatic stress disorder (PTSD) is crucial for positive mental health improvements and policy enhancements. The intricacies of collecting, integrating, and effectively leveraging clinical notes for this purpose introduce complexities.
In our study, we created an NLP workflow to analyze electronic medical record (EMR) data, and identify and extract research of domain criteria using a pre-trained transformer-based natural language model, allmpnet-base-v2. We subsequently built dictionaries from 100,000 clinical notes and analyzed 5.67 million clinical notes from 38,807 PTSD patients from the University of Pittsburgh Medical Center. Subsequently, we showcased the significance of our approach by extracting and visualizing RDoC information in two use cases: (i) across multiple patient populations and (ii) throughout various disease trajectories.
The sentence transformer model demonstrated superior F1 macro scores across all RDoC domains, achieving the highest performance with a cosine similarity threshold value of 0.3. This ensured an F1 score of at least 80% across all RDoC domains. The study revealed consistent reductions in all six RDoC domains among PTSD patients after psychotherapy. Women had the highest abnormalities of sensorimotor systems, while veterans had the highest abnormalities of negative and positive valence systems. The domains following first diagnoses of PTSD were associated with heightened cue reactivity to trauma, suicide, alcohol, and substance consumption.
The findings provide initial insights into RDoC functioning in different populations and disease trajectories. Natural language processing proves valuable for capturing real-time, context dependent RDoC instances from extensive clinical notes.
从创伤后应激障碍(PTSD)等高危人群中提取领域标准研究(RDoC)对于改善心理健康和完善政策至关重要。为此收集、整合和有效利用临床记录的复杂性带来了诸多挑战。
在我们的研究中,我们创建了一个自然语言处理(NLP)工作流程来分析电子病历(EMR)数据,并使用预训练的基于Transformer的自然语言模型allmpnet-base-v2识别和提取领域标准研究。随后,我们从100,000份临床记录中构建了词典,并分析了匹兹堡大学医学中心38,807名PTSD患者的567万份临床记录。随后,我们通过在两个用例中提取和可视化RDoC信息展示了我们方法的重要性:(i)跨多个患者群体;(ii)在各种疾病轨迹中。
句子Transformer模型在所有RDoC领域均表现出卓越的F1宏分数,在余弦相似度阈值为0.3时达到最高性能。这确保了所有RDoC领域的F1分数至少为80%。研究表明,心理治疗后PTSD患者的所有六个RDoC领域均持续下降。女性的感觉运动系统异常最高,而退伍军人的负性和正性价系统异常最高。首次诊断为PTSD后的领域与对创伤、自杀、酒精和物质使用的线索反应性增加有关。
这些发现为RDoC在不同人群和疾病轨迹中的功能提供了初步见解。自然语言处理对于从大量临床记录中捕获实时、上下文相关的RDoC实例具有重要价值。