Vaccine and Drug Evaluation Centre, Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada.
Pharmacol Res Perspect. 2020 Dec;8(6):e00687. doi: 10.1002/prp2.687.
Characterizing long-term prescription data is challenging due to the time-varying nature of drug use. Conventional approaches summarize time-varying data into categorical variables based on simple measures, such as cumulative dose, while ignoring patterns of use. The loss of information can lead to misclassification and biased estimates of the exposure-outcome association. We introduce a classification method to characterize longitudinal prescription data with an unsupervised machine learning algorithm. We used administrative databases covering virtually all 1.3 million residents of Manitoba and explicitly designed features to describe the average dose, proportion of days covered (PDC), dose change, and dose variability, and clustered the resulting feature space using K-means clustering. We applied this method to metformin use in diabetes patients. We identified 27,786 metformin users and showed that the feature distributions of their metformin use are stable for varying the lengths of follow-up and that these distributions have clear interpretations. We found six distinct metformin user groups: patients with intermittent use, decreasing dose, increasing dose, high dose, and two medium dose groups (one with stable dose and one with highly variable use). Patients in the varying and decreasing dose groups had a higher chance of progression of diabetes than other patients. The method presented in this paper allows for characterization of drug use into distinct and clinically relevant groups in a way that cannot be obtained from merely classifying use by quantiles of overall use.
由于药物使用的时变性质,描述长期处方数据具有挑战性。传统方法根据累积剂量等简单措施将时变数据概括为分类变量,而忽略了使用模式。信息的丢失会导致暴露-结果关联的错误分类和有偏估计。我们引入了一种分类方法,使用无监督机器学习算法来描述纵向处方数据。我们使用了覆盖曼尼托巴省几乎所有 130 万居民的管理数据库,并专门设计了特征来描述平均剂量、覆盖率(PDC)、剂量变化和剂量变异性,并使用 K-均值聚类对生成的特征空间进行聚类。我们将这种方法应用于糖尿病患者的二甲双胍使用情况。我们确定了 27786 名二甲双胍使用者,并表明他们的二甲双胍使用的特征分布在不同的随访长度下是稳定的,并且这些分布有明确的解释。我们发现了六个不同的二甲双胍使用者群体:间歇性使用、剂量减少、剂量增加、高剂量和两个中剂量群体(一个剂量稳定,一个剂量变化大)。剂量变化和剂量减少的患者比其他患者更有可能发展为糖尿病。本文提出的方法允许将药物使用描述为不同的、临床相关的群体,而不仅仅是通过整体使用的分位数进行分类是无法获得的。