Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
Colorado State University, Fort Collins, CO, 80521, USA.
BMC Med Inform Decis Mak. 2024 Jun 4;24(1):154. doi: 10.1186/s12911-024-02554-8.
Extracting research of domain criteria (RDoC) from high-risk populations like those with post-traumatic stress disorder (PTSD) is crucial for positive mental health improvements and policy enhancements. The intricacies of collecting, integrating, and effectively leveraging clinical notes for this purpose introduce complexities.
In our study, we created a natural language processing (NLP) workflow to analyze electronic medical record (EMR) data and identify and extract research of domain criteria using a pre-trained transformer-based natural language model, all-mpnet-base-v2. We subsequently built dictionaries from 100,000 clinical notes and analyzed 5.67 million clinical notes from 38,807 PTSD patients from the University of Pittsburgh Medical Center. Subsequently, we showcased the significance of our approach by extracting and visualizing RDoC information in two use cases: (i) across multiple patient populations and (ii) throughout various disease trajectories.
The sentence transformer model demonstrated high F1 macro scores across all RDoC domains, achieving the highest performance with a cosine similarity threshold value of 0.3. This ensured an F1 score of at least 80% across all RDoC domains. The study revealed consistent reductions in all six RDoC domains among PTSD patients after psychotherapy. We found that 60.6% of PTSD women have at least one abnormal instance of the six RDoC domains as compared to PTSD men (51.3%), with 45.1% of PTSD women with higher levels of sensorimotor disturbances compared to men (41.3%). We also found that 57.3% of PTSD patients have at least one abnormal instance of the six RDoC domains based on our records. Also, veterans had the higher abnormalities of negative and positive valence systems (60% and 51.9% of veterans respectively) compared to non-veterans (59.1% and 49.2% respectively). The domains following first diagnoses of PTSD were associated with heightened cue reactivity to trauma, suicide, alcohol, and substance consumption.
The findings provide initial insights into RDoC functioning in different populations and disease trajectories. Natural language processing proves valuable for capturing real-time, context dependent RDoC instances from extensive clinical notes.
从创伤后应激障碍(PTSD)等高危人群中提取研究领域标准(RDoC)对于积极的心理健康改善和政策增强至关重要。为此,收集、整合和有效利用临床记录的复杂性很高。
在我们的研究中,我们创建了一个自然语言处理(NLP)工作流程,使用基于预训练转换器的自然语言模型 all-mpnet-base-v2 分析电子病历(EMR)数据并识别和提取研究领域标准。我们随后从 100,000 份临床记录中构建了字典,并分析了来自匹兹堡大学医学中心的 38,807 名 PTSD 患者的 567 万份临床记录。随后,我们通过在两个用例中提取和可视化 RDoC 信息来展示我们方法的意义:(i)在多个患者群体中,以及(ii)在各种疾病轨迹中。
句子转换器模型在所有 RDoC 领域的 F1 宏得分都很高,在余弦相似度阈值为 0.3 时达到了最高性能。这确保了所有 RDoC 领域的 F1 得分至少为 80%。研究表明,在心理治疗后,PTSD 患者的所有六个 RDoC 领域的得分都持续下降。我们发现,与 PTSD 男性(51.3%)相比,60.6%的 PTSD 女性至少有一个六个 RDoC 领域的异常情况,与男性(41.3%)相比,45.1%的 PTSD 女性的感觉运动障碍程度更高。我们还发现,根据我们的记录,57.3%的 PTSD 患者至少有一个六个 RDoC 领域的异常情况。此外,与非退伍军人(分别为 59.1%和 49.2%)相比,退伍军人的正负效价系统异常更高(分别为 60%和 51.9%)。 PTSD 首次诊断后的领域与创伤、自杀、酒精和物质消费的线索反应性增强有关。
这些发现为不同人群和疾病轨迹中的 RDoC 功能提供了初步见解。自然语言处理对于从大量临床记录中捕获实时、上下文相关的 RDoC 实例非常有价值。