Suppr超能文献

基于电子健康记录中词袋模型的支持向量特征选择用于早期检测吻合口漏

Support Vector Feature Selection for Early Detection of Anastomosis Leakage From Bag-of-Words in Electronic Health Records.

作者信息

Soguero-Ruiz Cristina, Hindberg Kristian, Rojo-Alvarez Jose Luis, Skrovseth Stein Olav, Godtliebsen Fred, Mortensen Kim, Revhaug Arthur, Lindsetmo Rolv-Ole, Augestad Knut Magne, Jenssen Robert

出版信息

IEEE J Biomed Health Inform. 2016 Sep;20(5):1404-15. doi: 10.1109/JBHI.2014.2361688. Epub 2014 Oct 8.

Abstract

The free text in electronic health records (EHRs) conveys a huge amount of clinical information about health state and patient history. Despite a rapidly growing literature on the use of machine learning techniques for extracting this information, little effort has been invested toward feature selection and the features' corresponding medical interpretation. In this study, we focus on the task of early detection of anastomosis leakage (AL), a severe complication after elective surgery for colorectal cancer (CRC) surgery, using free text extracted from EHRs. We use a bag-of-words model to investigate the potential for feature selection strategies. The purpose is earlier detection of AL and prediction of AL with data generated in the EHR before the actual complication occur. Due to the high dimensionality of the data, we derive feature selection strategies using the robust support vector machine linear maximum margin classifier, by investigating: 1) a simple statistical criterion (leave-one-out-based test); 2) an intensive-computation statistical criterion (Bootstrap resampling); and 3) an advanced statistical criterion (kernel entropy). Results reveal a discriminatory power for early detection of complications after CRC (sensitivity 100%; specificity 72%). These results can be used to develop prediction models, based on EHR data, that can support surgeons and patients in the preoperative decision making phase.

摘要

电子健康记录(EHRs)中的自由文本传达了大量有关健康状况和患者病史的临床信息。尽管关于使用机器学习技术提取这些信息的文献迅速增加,但在特征选择及其相应的医学解释方面投入的精力却很少。在本研究中,我们专注于利用从EHRs中提取的自由文本对吻合口漏(AL)进行早期检测的任务,吻合口漏是择期结直肠癌(CRC)手术后的一种严重并发症。我们使用词袋模型来研究特征选择策略的潜力。目的是在实际并发症发生之前,利用EHR中生成的数据对AL进行早期检测和预测。由于数据的高维度性,我们通过研究以下方面,使用稳健支持向量机线性最大间隔分类器推导特征选择策略:1)一个简单的统计标准(基于留一法的测试);2)一个计算密集型统计标准(自助重采样);3)一个先进的统计标准(核熵)。结果显示了对CRC术后并发症进行早期检测的判别能力(敏感性100%;特异性72%)。这些结果可用于基于EHR数据开发预测模型,以在术前决策阶段为外科医生和患者提供支持。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验