Suppr超能文献

使用自然语言处理和机器学习对临床记录进行早期婴儿喂养状态分类。

Classifying early infant feeding status from clinical notes using natural language processing and machine learning.

机构信息

Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, 2004 Mowry Road, Clinical and Translational Research Building, Gainesville, FL, 32610, USA.

Department of Obstetrics and Gynecology, University of Florida College of Medicine, Gainesville, FL, 32610, USA.

出版信息

Sci Rep. 2024 Apr 3;14(1):7831. doi: 10.1038/s41598-024-58299-x.

Abstract

The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.

摘要

这项研究的目的是开发和评估自然语言处理(NLP)和机器学习模型,以从 Epic 电子健康记录系统中的临床记录中预测婴儿喂养状态。主要结果是使用医学主题词(MeSH)术语对婴儿喂养状态进行分类。使用 TeamTat 对笔记进行注释,根据婴儿喂养状态对临床笔记进行独特分类。我们训练了 6 个机器学习模型来对婴儿喂养状态进行分类:逻辑回归、随机森林、XGBoost 梯度下降、k-最近邻和支持向量分类器。基于整体准确性、精度、召回率和 F1 得分来评估模型比较。我们的建模语料库包含数量相等的临床记录,这些记录在每个类别中都是平衡的样本。我们手动审查了 999 份记录,这些记录代表了 746 对母婴对,平均胎龄为 38.9 周,母亲平均年龄为 26.6 岁。本研究中最常见的喂养状态分类是纯母乳喂养[n=183(18.3%)],其次是纯配方奶瓶喂养[n=146(14.6%)],纯母乳喂养[n=102(10.2%)],混合喂养最不常见[n=23(2.3%)]。我们的最终分析评估了临床笔记的分类,分为母乳喂养、配方奶/奶瓶喂养和缺失。在进行平衡和下采样后,将机器学习模型应用于这三个类别进行训练。XGBoost 模型的表现优于其他所有模型,准确率为 90.1%,宏平均精度为 90.3%,宏平均召回率为 90.1%,宏平均 F1 得分为 90.1%。我们的研究结果表明,自然语言处理可以应用于电子健康记录中存储的临床记录,以对婴儿喂养状态进行分类。使用 NLP 对非结构化电子健康记录数据进行早期母乳喂养状态识别,可用于告知以改善产后患者哺乳支持为重点的精准公共卫生干预措施。

相似文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验