在分类任务中提高健康证据质量：一种利用基于案例推理和过程特征的三角测量方法。

Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features.

作者信息

Guo Ruihua, Smith Ross, Chen Qifan, Ritchie Angus, Poon Simon

机构信息

School of Computer Science, The University of Sydney, Sydney, NSW, Australia.

Population Health Group, Australian Institute of Health and Welfare, Canberra, ACT, Australia.

出版信息

Digit Health. 2025 Jan 17;11:20552076251314097. doi: 10.1177/20552076251314097. eCollection 2025 Jan-Dec.

DOI:10.1177/20552076251314097

PMID:39839956

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11748077/

Abstract

OBJECTIVE

Machine learning (ML) has enabled healthcare discoveries by facilitating efficient modeling, such as for cancer screening. Unlike clinical trials, real-world data used in ML are often gathered for multiple purposes, leading to bias and missing information for a specific classification task. This challenge is especially pronounced in healthcare because of stringent ethical considerations and resource constraints.This study proposed an integrated approach to enhance the quality of health evidence from a classification task for predicting Medicare's Diagnosis-Related Groups of ischemic heart disease (IHD) patients.

METHODS

Eligible participants were identified from the Medical Information Mart for Intensive Care IV (MIMIC IV), a publicly available hospital database. Six ML models were selected for model triangulation. Sequential triangulation was employed via Local Process Mining (LPM) and Qualitative Comparative Analysis (QCA).

RESULTS

A total of 1545 IHD hospitalizations from 916 patients were identified from the MIMIC IV. Eight health process features were identified through LPM aligned with clinical knowledge. The correlation coefficients for process features, ranging from 0.24 to 0.42, are higher than those for non-process features ranged from 0.02 to 0.36. A total of 56 unique combinations were identified from the QCA, with 28 configurations having raw coverage lower than 1.0%. The overall model performance (i.e. weighted F1 and area under the curve scores) increased after adopting this integrated approach. The proportion of cases misclassified by any of the six models decreased by 47% after incorporating process features (from 5.29% to 2.91%) and further decreased to 0.0% after applying the QCA solutions.

CONCLUSION

The integrated approach demonstrates its ability to enhance quality of a classification task through its clinical relevance, improved model performance, and reduced case-level error rates. However, more scalable QCA methods are needed for larger datasets. Developing health process feature engineering for broader applications can be a future direction.

摘要

目的

机器学习（ML）通过促进高效建模，如用于癌症筛查，推动了医疗保健领域的发现。与临床试验不同，ML中使用的真实世界数据通常是为多种目的收集的，这导致针对特定分类任务存在偏差和信息缺失。由于严格的伦理考量和资源限制，这一挑战在医疗保健领域尤为突出。本研究提出了一种综合方法，以提高预测医疗保险缺血性心脏病（IHD）患者诊断相关组分类任务的健康证据质量。

方法

从公开可用的重症监护医学信息集市IV（MIMIC IV）中识别符合条件的参与者。选择六个ML模型进行模型三角剖分。通过局部过程挖掘（LPM）和定性比较分析（QCA）采用顺序三角剖分。

结果

从MIMIC IV中识别出916名患者的1545次IHD住院治疗。通过与临床知识一致的LPM识别出八个健康过程特征。过程特征的相关系数在0.24至0.42之间，高于非过程特征的相关系数（在0.02至0.36之间）。从QCA中总共识别出56种独特组合，其中28种配置的原始覆盖率低于1.0%。采用这种综合方法后，整体模型性能（即加权F1和曲线下面积得分）有所提高。纳入过程特征后，六个模型中任何一个错误分类的病例比例下降了47%（从5.29%降至2.91%），应用QCA解决方案后进一步降至0.0%。

结论

该综合方法通过其临床相关性、改进的模型性能和降低的病例级错误率，展示了提高分类任务质量的能力。然而，对于更大的数据集，需要更具可扩展性的QCA方法。为更广泛的应用开发健康过程特征工程可能是未来的一个方向。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在分类任务中提高健康证据质量：一种利用基于案例推理和过程特征的三角测量方法。

Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献

在分类任务中提高健康证据质量：一种利用基于案例推理和过程特征的三角测量方法。

Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献