文献检索，用中文搜 PubMed

BACKGROUND

The electronic medical record contains a wealth of information buried in free text. We created a natural language processing algorithm to identify patients with atrial fibrillation (AF) using text alone.

METHODS AND RESULTS

We created 3 data sets from patients with at least one AF billing code from 2010 to 2017: a training set (n=886), an internal validation set from site no. 1 (n=285), and an external validation set from site no. 2 (n=276). A team of clinicians reviewed and adjudicated patients as AF present or absent, which served as the reference standard. We trained 54 algorithms to classify each patient, varying the model, number of features, number of stop words, and the method used to create the feature set. The algorithm with the highest F-score (the harmonic mean of sensitivity and positive predictive value) in the training set was applied to the validation sets. F-scores and area under the receiver operating characteristic curves were compared between site no. 1 and site no. 2 using bootstrapping. Adjudicated AF prevalence was 75.1% at site no. 1 and 86.2% at site no. 2. Among 54 algorithms, the best performing model was logistic regression, using 1000 features, 100 stop words, and term frequency-inverse document frequency method to create the feature set, with sensitivity 92.8%, specificity 93.9%, and an area under the receiver operating characteristic curve of 0.93 in the training set. The performance at site no. 1 was sensitivity 92.5%, specificity 88.7%, with an area under the receiver operating characteristic curve of 0.91. The performance at site no. 2 was sensitivity 89.5%, specificity 71.1%, with an area under the receiver operating characteristic curve of 0.80. The F-score was lower at site no. 2 compared with site no. 1 (92.5% [SD, 1.1%] versus 94.2% [SD, 1.1%]; <0.001).

CONCLUSIONS

We developed a natural language processing algorithm to identify patients with AF using text alone, with >90% F-score at 2 separate sites. This approach allows better use of the clinical narrative and creates an opportunity for precise, high-throughput cohort identification.

BACKGROUND

METHODS AND RESULTS

CONCLUSIONS

背景

电子病历中包含大量隐藏在自由文本中的信息。我们创建了一种自然语言处理算法，仅使用文本即可识别房颤（AF）患者。

方法和结果

我们从 2010 年至 2017 年至少有一个 AF 计费代码的患者中创建了 3 个数据集：训练集（n=886）、来自站点 1 的内部验证集（n=285）和来自站点 2 的外部验证集（n=276）。一组临床医生审查并裁决患者是否存在 AF，作为参考标准。我们训练了 54 种算法来对每个患者进行分类，改变模型、特征数量、停用词数量以及创建特征集的方法。在训练集中具有最高 F 分数（敏感性和阳性预测值的调和平均值）的算法应用于验证集。使用 bootstrap 比较站点 1 和站点 2 之间的 F 分数和接收者操作特征曲线下面积。站点 1 的经裁决的 AF 患病率为 75.1%，站点 2 为 86.2%。在 54 种算法中，表现最好的模型是逻辑回归，使用 1000 个特征、100 个停用词和词频-文档频率方法创建特征集，训练集的敏感性为 92.8%，特异性为 93.9%，接收者操作特征曲线下面积为 0.93。站点 1 的性能为敏感性 92.5%，特异性 88.7%，接收者操作特征曲线下面积为 0.91。站点 2 的性能为敏感性 89.5%，特异性 71.1%，接收者操作特征曲线下面积为 0.80。与站点 1 相比，站点 2 的 F 分数较低（92.5%[SD，1.1%]与 94.2%[SD，1.1%]；<0.001）。

结论

我们开发了一种自然语言处理算法，仅使用文本即可识别 AF 患者，在两个独立站点的准确率超过 90%。这种方法可以更好地利用临床描述，并为精确、高通量队列识别创造机会。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

开发一种利用电子病历中的临床记录识别房颤患者的便携式工具。

Development of a Portable Tool to Identify Patients With Atrial Fibrillation Using Clinical Notes From the Electronic Medical Record.

机构信息

出版信息

BACKGROUND

METHODS AND RESULTS

CONCLUSIONS

相似文献

引用本文的文献

本文引用的文献

开发一种利用电子病历中的临床记录识别房颤患者的便携式工具。

Development of a Portable Tool to Identify Patients With Atrial Fibrillation Using Clinical Notes From the Electronic Medical Record.

机构信息

出版信息

BACKGROUND

METHODS AND RESULTS

CONCLUSIONS

背景

方法和结果

结论

相似文献

引用本文的文献

本文引用的文献