Littman Matthew, Nguyen Huy-Binh, Campbell Joanna, Keyloun Katelyn R
AbbVie, North Chicago, IL, USA.
Brain Inform. 2025 May 22;12(1):12. doi: 10.1186/s40708-025-00258-x.
In real-world psychiatric practice, patients may experience complex treatment journeys, including various diagnoses and lines of therapy. Insurance claims databases could potentially provide insight into outcomes of psychiatric treatment processes, but the diversity of event sequences restricts analyses with currently available methods. Here, we developed a novel kernel k-means clustering algorithm for event sequences that can accommodate highly diverse event types and sequence lengths. The approach, Divisive Optimized Clustering using Kernel K-means for Event Sequences (DOCKKES), also leverages a novel performance metric, the transition score, which measures sequence coherence in individual clusters. The performance of DOCKKES was evaluated in the context of bipolar I disorder, which is characterized by heterogeneous treatment journeys. We conducted a retrospective, observational analysis of a large sample (n = 31,578) of patients with bipolar I disorder from the MarketScan® Commercial Database. Using insurance claims, bipolar episode diagnoses and mental health-related lines of therapy were identified as events of interest for patient clustering. The dataset included 202,122 events; 75% of the cohort experienced unique treatment journeys. Based on an optimal run, DOCKKES identified 16 treatment journey clusters, which were evenly split for initial manic/mixed or depressive episodes (8 clusters each) and varied in sequence length and early lines of therapy. Variability across clusters was also observed for demographics, comorbidities, and mental health-related healthcare resource utilization and cost. This proof-of-concept study demonstrated the use of DOCKKES for integrating information from large datasets, enabling comparisons between patient clusters and evaluation of real-world treatment journeys in the context of evidence-based guidelines.
在现实世界的精神科实践中,患者可能会经历复杂的治疗过程,包括各种诊断和治疗方案。保险理赔数据库有可能提供对精神科治疗过程结果的洞察,但事件序列的多样性限制了使用现有方法进行的分析。在此,我们开发了一种用于事件序列的新型核k均值聚类算法,该算法可以适应高度多样化的事件类型和序列长度。这种方法,即使用核k均值的事件序列分裂优化聚类(DOCKKES),还利用了一种新型性能指标——转换分数,该指标用于衡量各个聚类中的序列连贯性。在以异质治疗过程为特征的双相I型障碍背景下评估了DOCKKES的性能。我们对来自MarketScan®商业数据库的大量双相I型障碍患者样本(n = 31,578)进行了回顾性观察分析。利用保险理赔,将双相情感发作诊断和心理健康相关治疗方案确定为患者聚类的感兴趣事件。该数据集包含202,122个事件;75%的队列经历了独特的治疗过程。基于一次最优运行,DOCKKES识别出16个治疗过程聚类,这些聚类在初始躁狂/混合发作或抑郁发作时平均分配(各8个聚类),并且在序列长度和早期治疗方案方面存在差异。在人口统计学、合并症以及心理健康相关医疗资源利用和成本方面,各聚类之间也观察到了变异性。这项概念验证研究证明了使用DOCKKES整合来自大型数据集的信息,能够在基于证据的指南背景下比较患者聚类并评估现实世界的治疗过程。