“管理我的疼痛”应用程序用户疼痛波动预测模型中的可解释性与类别不平衡：使用特征选择和多数投票方法的分析

Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods.

作者信息

Rahman Quazi Abidur, Janmohamed Tahir, Clarke Hance, Ritvo Paul, Heffernan Jane, Katz Joel

机构信息

Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada.

Centre for Disease Modelling, Department of Mathematics and Statistics, York University, Toronto, ON, Canada.

出版信息

JMIR Med Inform. 2019 Nov 20;7(4):e15601. doi: 10.2196/15601.

DOI:10.2196/15601

PMID:31746764

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6913759/

Abstract

BACKGROUND

Pain volatility is an important factor in chronic pain experience and adaptation. Previously, we employed machine-learning methods to define and predict pain volatility levels from users of the Manage My Pain app. Reducing the number of features is important to help increase interpretability of such prediction models. Prediction results also need to be consolidated from multiple random subsamples to address the class imbalance issue.

OBJECTIVE

This study aimed to: (1) increase the interpretability of previously developed pain volatility models by identifying the most important features that distinguish high from low volatility users; and (2) consolidate prediction results from models derived from multiple random subsamples while addressing the class imbalance issue.

METHODS

A total of 132 features were extracted from the first month of app use to develop machine learning-based models for predicting pain volatility at the sixth month of app use. Three feature selection methods were applied to identify features that were significantly better predictors than other members of the large features set used for developing the prediction models: (1) Gini impurity criterion; (2) information gain criterion; and (3) Boruta. We then combined the three groups of important features determined by these algorithms to produce the final list of important features. Three machine learning methods were then employed to conduct prediction experiments using the selected important features: (1) logistic regression with ridge estimators; (2) logistic regression with least absolute shrinkage and selection operator; and (3) random forests. Multiple random under-sampling of the majority class was conducted to address class imbalance in the dataset. Subsequently, a majority voting approach was employed to consolidate prediction results from these multiple subsamples. The total number of users included in this study was 879, with a total number of 391,255 pain records.

RESULTS

A threshold of 1.6 was established using clustering methods to differentiate between 2 classes: low volatility (n=694) and high volatility (n=185). The overall prediction accuracy is approximately 70% for both random forests and logistic regression models when using 132 features. Overall, 9 important features were identified using 3 feature selection methods. Of these 9 features, 2 are from the app use category and the other 7 are related to pain statistics. After consolidating models that were developed using random subsamples by majority voting, logistic regression models performed equally well using 132 or 9 features. Random forests performed better than logistic regression methods in predicting the high volatility class. The consolidated accuracy of random forests does not drop significantly (601/879; 68.4% vs 618/879; 70.3%) when only 9 important features are included in the prediction model.

CONCLUSIONS

We employed feature selection methods to identify important features in predicting future pain volatility. To address class imbalance, we consolidated models that were developed using multiple random subsamples by majority voting. Reducing the number of features did not result in a significant decrease in the consolidated prediction accuracy.

摘要

背景

疼痛波动性是慢性疼痛体验和适应过程中的一个重要因素。此前，我们运用机器学习方法从“管理我的疼痛”应用程序的用户中定义并预测疼痛波动水平。减少特征数量对于提高此类预测模型的可解释性很重要。预测结果还需要从多个随机子样本中进行整合，以解决类别不平衡问题。

目的

本研究旨在：（1）通过识别区分高波动用户和低波动用户的最重要特征，提高先前开发的疼痛波动模型的可解释性；（2）整合来自多个随机子样本的模型的预测结果，同时解决类别不平衡问题。

方法

从应用程序使用的第一个月提取了总共132个特征，以开发基于机器学习的模型，用于预测应用程序使用第六个月时的疼痛波动性。应用了三种特征选择方法来识别比用于开发预测模型的大特征集中的其他成员显著更好的预测特征：（1）基尼不纯度准则；（2）信息增益准则；（3）Boruta。然后，我们将由这些算法确定的三组重要特征进行组合，以生成重要特征的最终列表。然后使用三种机器学习方法，利用选定的重要特征进行预测实验：（1）带岭估计器的逻辑回归；（2）带最小绝对收缩和选择算子的逻辑回归；（3）随机森林。对多数类进行多次随机欠采样，以解决数据集中的类别不平衡问题。随后，采用多数投票方法整合这些多个子样本的预测结果。本研究纳入的用户总数为879人，共有391,255条疼痛记录。

结果

使用聚类方法确定了1.6的阈值，以区分两个类别：低波动性（n = 694）和高波动性（n = 185）。使用132个特征时，随机森林和逻辑回归模型的总体预测准确率约为70%。总体而言，使用三种特征选择方法识别出了9个重要特征。在这9个特征中，2个来自应用程序使用类别，另外7个与疼痛统计相关。通过多数投票整合使用随机子样本开发的模型后，逻辑回归模型使用132个或9个特征时表现同样出色。在预测高波动类别方面，随机森林的表现优于逻辑回归方法。当预测模型中仅包含9个重要特征时，随机森林的整合准确率没有显著下降（601/879；68.4%对618/879；70.3%）。

结论

我们采用特征选择方法来识别预测未来疼痛波动性的重要特征。为了解决类别不平衡问题，我们通过多数投票整合了使用多个随机子样本开发的模型。减少特征数量并未导致整合预测准确率显著下降。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b8b/6913759/27277e92e05e/medinform_v7i4e15601_fig1.jpg

相似文献

Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods.“管理我的疼痛”应用程序用户疼痛波动预测模型中的可解释性与类别不平衡：使用特征选择和多数投票方法的分析

JMIR Med Inform. 2019 Nov 20;7(4):e15601. doi: 10.2196/15601.

Defining and Predicting Pain Volatility in Users of the Manage My Pain App: Analysis Using Data Mining and Machine Learning Methods.定义和预测“管理我的疼痛”应用程序用户的疼痛波动：使用数据挖掘和机器学习方法进行分析

J Med Internet Res. 2018 Nov 15;20(11):e12001. doi: 10.2196/12001.

Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.利用公开可用的行政数据库预测30天再入院情况。一种条件逻辑回归建模方法。

Methods Inf Med. 2015;54(6):560-7. doi: 10.3414/ME14-02-0017. Epub 2015 Nov 9.

Early Detection of Septic Shock Onset Using Interpretable Machine Learners.使用可解释机器学习算法早期检测脓毒症休克发作

J Clin Med. 2021 Jan 15;10(2):301. doi: 10.3390/jcm10020301.

Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset.用于冠状动脉疾病诊断和预测的具有简化特征子集的异构分类器集成

Comput Methods Programs Biomed. 2021 Jan;198:105770. doi: 10.1016/j.cmpb.2020.105770. Epub 2020 Sep 30.

Prediction of plant-level tomato biomass and yield using machine learning with unmanned aerial vehicle imagery.利用机器学习和无人机图像预测单株番茄生物量和产量

Plant Methods. 2021 Jul 15;17(1):77. doi: 10.1186/s13007-021-00761-2.

Predicting the Easiness and Complexity of English Health Materials for International Tertiary Students With Linguistically Enhanced Machine Learning Algorithms: Development and Validation Study.使用语言增强机器学习算法预测国际大学生英语健康材料的难易程度：开发与验证研究

JMIR Med Inform. 2021 Oct 26;9(10):e25110. doi: 10.2196/25110.

Exploiting Machine Learning Algorithms and Methods for the Prediction of Agitated Delirium After Cardiac Surgery: Models Development and Validation Study.利用机器学习算法和方法预测心脏手术后的激越性谵妄：模型开发与验证研究

JMIR Med Inform. 2019 Oct 23;7(4):e14993. doi: 10.2196/14993.

Missing data imputation, prediction, and feature selection in diagnosis of vaginal prolapse.阴道脱垂诊断中的缺失数据插补、预测和特征选择。

BMC Med Res Methodol. 2023 Nov 6;23(1):259. doi: 10.1186/s12874-023-02079-0.

A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data.基于聚类和纵向数据的医学预测模型的特征选择随机森林方法。

J Biomed Inform. 2021 May;117:103763. doi: 10.1016/j.jbi.2021.103763. Epub 2021 Mar 26.

引用本文的文献

Using artificial intelligence to predict patient outcomes from patient-reported outcome measures: a scoping review.利用人工智能根据患者报告的结局指标预测患者预后：一项范围综述

Health Qual Life Outcomes. 2025 Apr 11;23(1):37. doi: 10.1186/s12955-025-02365-z.

Predicting Clinical Outcomes at the Toronto General Hospital Transitional Pain Service via the Manage My Pain App: Machine Learning Approach.通过“管理我的疼痛”应用程序预测多伦多综合医院过渡性疼痛服务的临床结果：机器学习方法。

JMIR Med Inform. 2025 Mar 28;13:e67178. doi: 10.2196/67178.

Knowledge translation initiatives at the Transitional Pain Service: insights from healthcare provider outreach and patient education.过渡性疼痛服务中的知识转化举措：来自医疗服务提供者推广及患者教育的见解

BMC Health Serv Res. 2025 Jan 29;25(1):169. doi: 10.1186/s12913-025-12301-y.

Moving towards the use of artificial intelligence in pain management.迈向人工智能在疼痛管理中的应用。

Eur J Pain. 2025 Mar;29(3):e4748. doi: 10.1002/ejp.4748. Epub 2024 Nov 10.

Exploratory application of machine learning methods on patient reported data in the development of supervised models for predicting outcomes.探索性应用机器学习方法于患者报告数据，以开发用于预测结果的有监督模型。

BMC Med Inform Decis Mak. 2022 Sep 1;22(1):227. doi: 10.1186/s12911-022-01973-9.

Review and Analysis of German Mobile Apps for Inflammatory Bowel Disease Management Using the Mobile Application Rating Scale: Systematic Search in App Stores and Content Analysis.使用移动应用程序评级量表对德国炎症性肠病管理的移动应用程序进行回顾和分析：应用商店的系统搜索和内容分析。

JMIR Mhealth Uhealth. 2022 May 3;10(5):e31102. doi: 10.2196/31102.

An interpretable machine learning model based on a quick pre-screening system enables accurate deterioration risk prediction for COVID-19.基于快速预筛选系统的可解释机器学习模型可准确预测 COVID-19 恶化风险。

Sci Rep. 2021 Nov 30;11(1):23127. doi: 10.1038/s41598-021-02370-4.

User Engagement and Clinical Impact of the Manage My Pain App in Patients With Chronic Pain: A Real-World, Multi-site Trial.患者慢性疼痛管理应用的用户参与度和临床影响：一项真实世界、多中心试验。

JMIR Mhealth Uhealth. 2021 Mar 4;9(3):e26528. doi: 10.2196/26528.

Predicting the response to neoadjuvant chemotherapy for breast cancer: wavelet transforming radiomics in MRI.预测乳腺癌新辅助化疗的反应：MRI 中的小波变换放射组学。

BMC Cancer. 2020 Feb 5;20(1):100. doi: 10.1186/s12885-020-6523-2.

本文引用的文献

J Med Internet Res. 2018 Nov 15;20(11):e12001. doi: 10.2196/12001.

Unpredictable pain timings lead to greater pain when people are highly intolerant of uncertainty.当人们对不确定性高度不耐受时，不可预测的疼痛时机往往会导致更强烈的疼痛。

Scand J Pain. 2017 Oct;17:367-372. doi: 10.1016/j.sjpain.2017.09.013. Epub 2017 Oct 13.

Patterns of User Engagement With the Mobile App, Manage My Pain: Results of a Data Mining Investigation.用户与移动应用程序“管理我的疼痛”的互动模式：数据挖掘调查结果

JMIR Mhealth Uhealth. 2017 Jul 12;5(7):e96. doi: 10.2196/mhealth.7871.

Volatility and change in chronic pain severity predict outcomes of treatment for prescription opioid addiction.慢性疼痛严重程度的波动性和变化可预测处方阿片类药物成瘾的治疗结果。

Addiction. 2017 Jul;112(7):1202-1209. doi: 10.1111/add.13782. Epub 2017 Feb 28.

Evaluating mobile phone applications for health behaviour change: A systematic review.评估用于健康行为改变的手机应用程序：系统评价。

J Telemed Telecare. 2018 Jan;24(1):22-30. doi: 10.1177/1357633X16673538. Epub 2016 Oct 18.

Pain volatility and prescription opioid addiction treatment outcomes in patients with chronic pain.慢性疼痛患者的疼痛波动性与处方阿片类药物成瘾治疗结果

Exp Clin Psychopharmacol. 2015 Dec;23(6):428-35. doi: 10.1037/pha0000039. Epub 2015 Aug 24.

Why patients visit their doctors: assessing the most prevalent conditions in a defined American population.为什么患者要看医生：评估一个特定美国人群中最常见的疾病。

Mayo Clin Proc. 2013 Jan;88(1):56-67. doi: 10.1016/j.mayocp.2012.08.020.

Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径

J Stat Softw. 2010;33(1):1-22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

“管理我的疼痛”应用程序用户疼痛波动预测模型中的可解释性与类别不平衡：使用特征选择和多数投票方法的分析

Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献