Suppr超能文献

利用具有自由文本嵌入功能的荷兰初级保健咨询记录进行结直肠癌的早期检测。

Early detection of colorectal cancer by leveraging Dutch primary care consultation notes with free text embeddings.

机构信息

Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.

Department of General Practice/Family Medicine, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.

出版信息

Sci Rep. 2023 Jul 4;13(1):10760. doi: 10.1038/s41598-023-37397-2.

Abstract

We aimed to assess the added predictive performance that free-text Dutch consultation notes provide in detecting colorectal cancer in primary care, in comparison to currently used models. We developed, evaluated and compared three prediction models for colorectal cancer (CRC) in a large primary care database with 60,641 patients. The prediction model with both known predictive features and free-text data (with TabTxt AUROC: 0.823) performs statistically significantly better (p < 0.05) than the other two models with only tabular (as used nowadays) and text data, respectively (AUROC Tab: 0.767; Txt: 0.797). The specificity of the two models that use demographics and known CRC features (with specificity Tab: 0.321; TabTxt: 0.335) are higher than that of the model with only free-text (specificity Txt: 0.234). The Txt and, to a lesser degree, TabTxt model are well calibrated, while the Tab model shows slight underprediction at both tails. As expected with an outcome prevalence below 0.01, all models show much uncalibrated predictions in the extreme upper tail (top 1%). Free-text consultation notes show promising results to improve the predictive performance over established prediction models that only use structured features. Clinical future implications for our CRC use case include that such improvement may help lowering the number of referrals for suspected CRC to medical specialists.

摘要

我们旨在评估荷兰自由文本咨询记录在检测初级保健中的结直肠癌方面提供的额外预测性能,与当前使用的模型相比。我们在一个拥有 60641 名患者的大型初级保健数据库中开发、评估和比较了三种结直肠癌(CRC)预测模型。与仅使用表格(如当前使用)和文本数据的另外两个模型相比,同时使用已知预测特征和自由文本数据的预测模型(TabTxt AUROC:0.823)在统计学上表现更好(p < 0.05)(AUROC Tab:0.767;Txt:0.797)。仅使用人口统计学和已知 CRC 特征的两个模型(特异性 Tab:0.321;TabTxt:0.335)的特异性高于仅使用自由文本的模型(特异性 Txt:0.234)。Txt 和在较小程度上 TabTxt 模型的校准效果较好,而 Tab 模型在两个尾部都显示出轻微的低估。由于预期结局的患病率低于 0.01,所有模型在极端上尾(前 1%)都显示出未校准的预测结果。自由文本咨询记录显示出改善预测性能的有希望的结果,超过了仅使用结构化特征的既定预测模型。我们的 CRC 用例的临床未来意义包括,这种改进可能有助于减少疑似 CRC 向医学专家转诊的数量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验