Rahman Quazi Abidur, Janmohamed Tahir, Pirbaglou Meysam, Clarke Hance, Ritvo Paul, Heffernan Jane M, Katz Joel
Centre for Disease Modelling, Department of Mathematics and Statistics, York University, Toronto, ON, Canada.
ManagingLife, Inc, Toronto, ON, Canada.
J Med Internet Res. 2018 Nov 15;20(11):e12001. doi: 10.2196/12001.
Measuring and predicting pain volatility (fluctuation or variability in pain scores over time) can help improve pain management. Perceptions of pain and its consequent disabling effects are often heightened under the conditions of greater uncertainty and unpredictability associated with pain volatility.
This study aimed to use data mining and machine learning methods to (1) define a new measure of pain volatility and (2) predict future pain volatility levels from users of the pain management app, Manage My Pain, based on demographic, clinical, and app use features.
Pain volatility was defined as the mean of absolute changes between 2 consecutive self-reported pain severity scores within the observation periods. The k-means clustering algorithm was applied to users' pain volatility scores at the first and sixth month of app use to establish a threshold discriminating low from high volatility classes. Subsequently, we extracted 130 demographic, clinical, and app usage features from the first month of app use to predict these 2 volatility classes at the sixth month of app use. Prediction models were developed using 4 methods: (1) logistic regression with ridge estimators; (2) logistic regression with Least Absolute Shrinkage and Selection Operator; (3) Random Forests; and (4) Support Vector Machines. Overall prediction accuracy and accuracy for both classes were calculated to compare the performance of the prediction models. Training and testing were conducted using 5-fold cross validation. A class imbalance issue was addressed using a random subsampling of the training dataset. Users with at least five pain records in both the predictor and outcome periods (N=782 users) are included in the analysis.
k-means clustering algorithm was applied to pain volatility scores to establish a threshold of 1.6 to differentiate between low and high volatility classes. After validating the threshold using random subsamples, 2 classes were created: low volatility (n=611) and high volatility (n=171). In this class-imbalanced dataset, all 4 prediction models achieved 78.1% (611/782) to 79.0% (618/782) in overall accuracy. However, all models have a prediction accuracy of less than 18.7% (32/171) for the high volatility class. After addressing the class imbalance issue using random subsampling, results improved across all models for the high volatility class to greater than 59.6% (102/171). The prediction model based on Random Forests performs the best as it consistently achieves approximately 70% accuracy for both classes across 3 random subsamples.
We propose a novel method for measuring pain volatility. Cluster analysis was applied to divide users into subsets of low and high volatility classes. These classes were then predicted at the sixth month of app use with an acceptable degree of accuracy using machine learning methods based on the features extracted from demographic, clinical, and app use information from the first month.
测量和预测疼痛波动性(疼痛评分随时间的波动或变异性)有助于改善疼痛管理。在与疼痛波动性相关的更大不确定性和不可预测性条件下,对疼痛及其随之而来的致残影响的感知通常会增强。
本研究旨在使用数据挖掘和机器学习方法:(1)定义一种新的疼痛波动性测量方法;(2)根据人口统计学、临床和应用程序使用特征,从疼痛管理应用程序“管理我的疼痛”的用户中预测未来的疼痛波动性水平。
疼痛波动性定义为观察期内连续两次自我报告的疼痛严重程度评分之间绝对变化的平均值。将k均值聚类算法应用于应用程序使用第一个月和第六个月时用户的疼痛波动性评分,以建立区分低波动性和高波动性类别的阈值。随后,我们从应用程序使用的第一个月提取了130个人口统计学、临床和应用程序使用特征,以预测应用程序使用第六个月时的这两种波动性类别。使用4种方法开发预测模型:(1)带岭估计器的逻辑回归;(2)带最小绝对收缩和选择算子的逻辑回归;(3)随机森林;(4)支持向量机。计算总体预测准确性和两类的准确性,以比较预测模型的性能。使用5折交叉验证进行训练和测试。使用训练数据集的随机子采样解决类别不平衡问题。分析纳入在预测期和结果期均有至少5条疼痛记录的用户(N = 782名用户)。
将k均值聚类算法应用于疼痛波动性评分,以建立1.6的阈值来区分低波动性和高波动性类别。使用随机子样本验证阈值后,创建了两类:低波动性(n = 611)和高波动性(n = 171)。在这个类别不平衡的数据集中,所有4种预测模型的总体准确率达到78.1%(611/782)至79.0%(618/782)。然而,所有模型对高波动性类别的预测准确率均低于18.7%(32/171)。使用随机子采样解决类别不平衡问题后,所有模型对高波动性类别的结果均有所改善,超过59.6%(102/171)。基于随机森林的预测模型表现最佳,因为它在3个随机子样本中对两类的准确率始终约为70%。
我们提出了一种测量疼痛波动性的新方法。应用聚类分析将用户分为低波动性和高波动性类别子集。然后,使用机器学习方法,根据从第一个月的人口统计学、临床和应用程序使用信息中提取的特征,在应用程序使用第六个月时以可接受的准确度预测这些类别。