基于临床专家知识的特征工程：机器学习模型复杂度和性能的案例研究评估。

Feature engineering with clinical expert knowledge: A case study assessment of machine learning model complexity and performance.

机构信息

Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America.

The Institute of Clinical and Translational Research, Johns Hopkins University, Baltimore, MD, United States of America.

出版信息

PLoS One. 2020 Apr 23;15(4):e0231300. doi: 10.1371/journal.pone.0231300. eCollection 2020.

DOI:10.1371/journal.pone.0231300

PMID:32324754

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7179831/

Abstract

Incorporating expert knowledge at the time machine learning models are trained holds promise for producing models that are easier to interpret. The main objectives of this study were to use a feature engineering approach to incorporate clinical expert knowledge prior to applying machine learning techniques, and to assess the impact of the approach on model complexity and performance. Four machine learning models were trained to predict mortality with a severe asthma case study. Experiments to select fewer input features based on a discriminative score showed low to moderate precision for discovering clinically meaningful triplets, indicating that discriminative score alone cannot replace clinical input. When compared to baseline machine learning models, we found a decrease in model complexity with use of fewer features informed by discriminative score and filtering of laboratory features with clinical input. We also found a small difference in performance for the mortality prediction task when comparing baseline ML models to models that used filtered features. Encoding demographic and triplet information in ML models with filtered features appeared to show performance improvements from the baseline. These findings indicated that the use of filtered features may reduce model complexity, and with little impact on performance.

摘要

在训练机器学习模型时纳入专家知识有望生成更易于解释的模型。本研究的主要目的是在应用机器学习技术之前，使用特征工程方法纳入临床专家知识，并评估该方法对模型复杂性和性能的影响。使用机器学习模型对严重哮喘病例进行死亡率预测。基于判别得分选择较少输入特征的实验显示，发现具有临床意义的三重组合的精度较低至中等，这表明判别得分不能替代临床输入。与基线机器学习模型相比，我们发现使用基于判别得分的较少特征和具有临床输入的实验室特征过滤，模型复杂性降低。在比较死亡率预测任务时，我们还发现比较基线 ML 模型和使用过滤特征的模型时，性能略有差异。在具有过滤特征的 ML 模型中编码人口统计学和三重信息似乎显示出从基线提高的性能。这些发现表明，使用过滤特征可以降低模型的复杂性，对性能的影响很小。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7210/7179831/9a450fff34eb/pone.0231300.g001.jpg

相似文献

Feature engineering with clinical expert knowledge: A case study assessment of machine learning model complexity and performance.基于临床专家知识的特征工程：机器学习模型复杂度和性能的案例研究评估。

PLoS One. 2020 Apr 23;15(4):e0231300. doi: 10.1371/journal.pone.0231300. eCollection 2020.

Comparison of time-to-event machine learning models in predicting oral cavity cancer prognosis.比较时间事件机器学习模型在预测口腔癌预后中的应用。

Int J Med Inform. 2022 Jan;157:104635. doi: 10.1016/j.ijmedinf.2021.104635. Epub 2021 Nov 14.

Asthma prediction via affinity graph enhanced classifier: a machine learning approach based on routine blood biomarkers.通过亲和图增强分类器预测哮喘：一种基于常规血液生物标志物的机器学习方法。

J Transl Med. 2024 Jan 24;22(1):100. doi: 10.1186/s12967-024-04866-9.

Discovering Pediatric Asthma Phenotypes on the Basis of Response to Controller Medication Using Machine Learning.基于机器学习发现对控制器药物治疗反应的儿童哮喘表型。

Ann Am Thorac Soc. 2018 Jan;15(1):49-58. doi: 10.1513/AnnalsATS.201702-101OC.

Personalized prediction of early childhood asthma persistence: A machine learning approach.个体化预测儿童哮喘持续状态：一种机器学习方法。

PLoS One. 2021 Mar 1;16(3):e0247784. doi: 10.1371/journal.pone.0247784. eCollection 2021.

Prediction of Clinical Outcome in Patients with Large-Vessel Acute Ischemic Stroke: Performance of Machine Learning versus SPAN-100.大血管急性缺血性脑卒中患者临床结局的预测：机器学习与 SPAN-100 的性能比较。

AJNR Am J Neuroradiol. 2021 Jan;42(2):240-246. doi: 10.3174/ajnr.A6918. Epub 2021 Jan 7.

Overall Survival Prognostic Modelling of Non-small Cell Lung Cancer Patients Using Positron Emission Tomography/Computed Tomography Harmonised Radiomics Features: The Quest for the Optimal Machine Learning Algorithm.正电子发射断层扫描/计算机断层扫描调和放射组学特征预测非小细胞肺癌患者总生存期：最优机器学习算法的探索。

Clin Oncol (R Coll Radiol). 2022 Feb;34(2):114-127. doi: 10.1016/j.clon.2021.11.014. Epub 2021 Dec 3.

A machine learning approach for mortality prediction only using non-invasive parameters.一种仅使用非侵入性参数进行死亡率预测的机器学习方法。

Med Biol Eng Comput. 2020 Oct;58(10):2195-2238. doi: 10.1007/s11517-020-02174-0. Epub 2020 Jul 20.

Machine learning-based risk prediction of intrahospital clinical outcomes in patients undergoing TAVI.基于机器学习的 TAVI 术后患者院内临床结局的风险预测。

Clin Res Cardiol. 2021 Mar;110(3):343-356. doi: 10.1007/s00392-020-01691-0. Epub 2020 Jun 24.

Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution?提高伤害分类中罕见类别的自动编码性能：更多的训练数据还是过滤是解决方案？

Accid Anal Prev. 2018 Jan;110:115-127. doi: 10.1016/j.aap.2017.10.020. Epub 2017 Nov 8.

引用本文的文献

Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features.在分类任务中提高健康证据质量：一种利用基于案例推理和过程特征的三角测量方法。

Digit Health. 2025 Jan 17;11:20552076251314097. doi: 10.1177/20552076251314097. eCollection 2025 Jan-Dec.

Body Surface Potential Mapping: A Perspective on High-Density Cutaneous Electrophysiology.体表电位标测：高密度皮肤电生理学的视角

Adv Sci (Weinh). 2025 Jan;12(4):e2411087. doi: 10.1002/advs.202411087. Epub 2024 Dec 16.

Machine Learning-Based Asthma Attack Prediction Models From Routinely Collected Electronic Health Records: Systematic Scoping Review.基于机器学习的常规收集电子健康记录中的哮喘发作预测模型：系统综述

JMIR AI. 2023 Dec 7;2:e46717. doi: 10.2196/46717.

Machine Learning Methods Using Artificial Intelligence Deployed on Electronic Health Record Data for Identification and Referral of At-Risk Patients From Primary Care Physicians to Eye Care Specialists: Retrospective, Case-Controlled Study.利用人工智能的机器学习方法应用于电子健康记录数据，以识别有风险的患者并将其从初级保健医生转诊至眼科专科医生：回顾性病例对照研究。

JMIR AI. 2024 Mar 12;3:e48295. doi: 10.2196/48295.

Towards equitable AI in oncology.迈向肿瘤学领域的公平人工智能。

Nat Rev Clin Oncol. 2024 Aug;21(8):628-637. doi: 10.1038/s41571-024-00909-8. Epub 2024 Jun 7.

Deep learning in bioinformatics.生物信息学中的深度学习。

Turk J Biol. 2023 Dec 18;47(6):366-382. doi: 10.55730/1300-0152.2671. eCollection 2023.

Perspectives on incorporating expert feedback into model updates.关于将专家反馈纳入模型更新的观点。

Patterns (N Y). 2023 Jul 14;4(7):100780. doi: 10.1016/j.patter.2023.100780.

Using Machine Learning Algorithms to Pool Data from Meta-Analysis for the Prediction of Countermovement Jump Improvement.使用机器学习算法整合荟萃分析数据以预测反跳式跳高成绩的提升。

Int J Environ Res Public Health. 2023 May 19;20(10):5881. doi: 10.3390/ijerph20105881.

Right ventricular myocardial energetic model for evaluating right heart function in pulmonary arterial hypertension.右心室心肌能量学模型评估肺动脉高压右心功能。

Physiol Rep. 2022 May;10(10):e15136. doi: 10.14814/phy2.15136.

Learning Predictive and Interpretable Timeseries Summaries from ICU Data.从 ICU 数据中学习可预测和可解释的时间序列摘要。

AMIA Annu Symp Proc. 2022 Feb 21;2021:581-590. eCollection 2021.

本文引用的文献

Multitask learning and benchmarking with clinical time series data.多任务学习与临床时间序列数据的基准测试。

Sci Data. 2019 Jun 17;6(1):96. doi: 10.1038/s41597-019-0103-9.

Incorporating repeating temporal association rules in Naïve Bayes classifiers for coronary heart disease diagnosis.将重复时间关联规则纳入朴素贝叶斯分类器以进行冠心病诊断。

J Biomed Inform. 2018 May;81:74-82. doi: 10.1016/j.jbi.2018.03.002. Epub 2018 Mar 16.

Consistent discovery of frequent interval-based temporal patterns in chronic patients' data.在慢性患者数据中一致发现频繁基于区间的时间模式。

J Biomed Inform. 2017 Nov;75:83-95. doi: 10.1016/j.jbi.2017.10.002. Epub 2017 Oct 4.

A Roadmap for Optimizing Asthma Care Management via Computational Approaches.通过计算方法优化哮喘护理管理的路线图。

JMIR Med Inform. 2017 Sep 26;5(3):e32. doi: 10.2196/medinform.8076.

Scalable Joint Models for Reliable Uncertainty-Aware Event Prediction.用于可靠的不确定性感知事件预测的可扩展联合模型

IEEE Trans Pattern Anal Mach Intell. 2018 Aug;40(8):1948-1963. doi: 10.1109/TPAMI.2017.2742504. Epub 2017 Aug 21.

Procedure prediction from symbolic Electronic Health Records via time intervals analytics.基于时间区间分析的符号式电子健康记录的过程预测。

J Biomed Inform. 2017 Nov;75:70-82. doi: 10.1016/j.jbi.2017.07.018. Epub 2017 Aug 17.

Predicting Severe Asthma Exacerbations in Children.预测儿童严重哮喘发作

Am J Respir Crit Care Med. 2017 Apr 1;195(7):854-859. doi: 10.1164/rccm.201606-1213PP.

Prognosis of Clinical Outcomes with Temporal Patterns and Experiences with One Class Feature Selection.具有时间模式的临床结果预后及单类特征选择的经验

IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):555-563. doi: 10.1109/TCBB.2016.2591539. Epub 2016 Jul 14.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review.利用电子健康记录数据开发风险预测模型的机遇与挑战：一项系统综述

J Am Med Inform Assoc. 2017 Jan;24(1):198-208. doi: 10.1093/jamia/ocw042. Epub 2016 May 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于临床专家知识的特征工程：机器学习模型复杂度和性能的案例研究评估。

Feature engineering with clinical expert knowledge: A case study assessment of machine learning model complexity and performance.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献