Suppr超能文献

文本挖掘应用于电子心血管手术报告,以识别患有三叶瓣主动脉狭窄和冠状动脉疾病的患者。

Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.

作者信息

Small Aeron M, Kiss Daniel H, Zlatsin Yevgeny, Birtwell David L, Williams Heather, Guerraty Marie A, Han Yuchi, Anwaruddin Saif, Holmes John H, Chirinos Julio A, Wilensky Robert L, Giri Jay, Rader Daniel J

机构信息

Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.

Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.

出版信息

J Biomed Inform. 2017 Aug;72:77-84. doi: 10.1016/j.jbi.2017.06.016. Epub 2017 Jun 15.

Abstract

BACKGROUND

Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest.

METHODS

We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes.

RESULTS

Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD.

CONCLUSION

These results highlight the superiority of text mining algorithms applied to electronic cardiovascular procedure reports in the identification of phenotypes of interest for cardiovascular research.

摘要

背景

使用计费代码作为感兴趣诊断的替代物来查询电子健康记录(EHR)已广泛用于临床研究。然而,这种方法的准确性存在差异,因为它反映的是计费代码而非疾病的严重程度,并且取决于疾病和编码从业者的准确性。将文本挖掘系统应用于EHR在检测心血管表型方面取得了不同程度的成功。我们假设将文本挖掘算法应用于心血管手术报告可能是识别患有感兴趣心血管疾病患者的更优方法。

方法

我们改编了甲骨文公司的产品Endeca,它利用文本挖掘从类似NoSQL的数据库中识别感兴趣的术语,用于搜索心血管手术报告,并将该工具命名为“PennSeek”。我们将代表81164个人的282569份超声心动图报告和代表14567个人的27205份心脏导管插入术报告从不可搜索的数据库导入PennSeek。然后我们在PennSeek中对这些报告应用临床标准,以识别患有三叶主动脉瓣狭窄(TAS)和冠状动脉疾病(CAD)的患者。将通过PennSeek进行文本挖掘识别患者的准确性与ICD - 9计费代码进行比较。

结果

文本挖掘识别出7115例TAS患者和9247例CAD患者。ICD - 9代码识别出8272例TAS患者和6913例CAD患者。两种方法均识别出4346例AS患者和6024例CAD患者。将通过文本挖掘唯一识别的200 - 250例患者的随机样本与通过计费代码为两种疾病唯一识别的200 - 250例患者进行比较。我们证明文本挖掘更具优势,对于TAS,其阳性预测值(PPV)为0.95,而ICD - 9为0.53;对于CAD,PPV为0.97,而ICD - 9为0.86。

结论

这些结果突出了将文本挖掘算法应用于电子心血管手术报告在识别心血管研究中感兴趣表型方面的优越性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验