• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

文本挖掘应用于电子心血管手术报告,以识别患有三叶瓣主动脉狭窄和冠状动脉疾病的患者。

Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.

作者信息

Small Aeron M, Kiss Daniel H, Zlatsin Yevgeny, Birtwell David L, Williams Heather, Guerraty Marie A, Han Yuchi, Anwaruddin Saif, Holmes John H, Chirinos Julio A, Wilensky Robert L, Giri Jay, Rader Daniel J

机构信息

Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.

Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.

出版信息

J Biomed Inform. 2017 Aug;72:77-84. doi: 10.1016/j.jbi.2017.06.016. Epub 2017 Jun 15.

DOI:10.1016/j.jbi.2017.06.016
PMID:28624641
Abstract

BACKGROUND

Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest.

METHODS

We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes.

RESULTS

Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD.

CONCLUSION

These results highlight the superiority of text mining algorithms applied to electronic cardiovascular procedure reports in the identification of phenotypes of interest for cardiovascular research.

摘要

背景

使用计费代码作为感兴趣诊断的替代物来查询电子健康记录(EHR)已广泛用于临床研究。然而,这种方法的准确性存在差异,因为它反映的是计费代码而非疾病的严重程度,并且取决于疾病和编码从业者的准确性。将文本挖掘系统应用于EHR在检测心血管表型方面取得了不同程度的成功。我们假设将文本挖掘算法应用于心血管手术报告可能是识别患有感兴趣心血管疾病患者的更优方法。

方法

我们改编了甲骨文公司的产品Endeca,它利用文本挖掘从类似NoSQL的数据库中识别感兴趣的术语,用于搜索心血管手术报告,并将该工具命名为“PennSeek”。我们将代表81164个人的282569份超声心动图报告和代表14567个人的27205份心脏导管插入术报告从不可搜索的数据库导入PennSeek。然后我们在PennSeek中对这些报告应用临床标准,以识别患有三叶主动脉瓣狭窄(TAS)和冠状动脉疾病(CAD)的患者。将通过PennSeek进行文本挖掘识别患者的准确性与ICD - 9计费代码进行比较。

结果

文本挖掘识别出7115例TAS患者和9247例CAD患者。ICD - 9代码识别出8272例TAS患者和6913例CAD患者。两种方法均识别出4346例AS患者和6024例CAD患者。将通过文本挖掘唯一识别的200 - 250例患者的随机样本与通过计费代码为两种疾病唯一识别的200 - 250例患者进行比较。我们证明文本挖掘更具优势,对于TAS,其阳性预测值(PPV)为0.95,而ICD - 9为0.53;对于CAD,PPV为0.97,而ICD - 9为0.86。

结论

这些结果突出了将文本挖掘算法应用于电子心血管手术报告在识别心血管研究中感兴趣表型方面的优越性。

相似文献

1
Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.文本挖掘应用于电子心血管手术报告,以识别患有三叶瓣主动脉狭窄和冠状动脉疾病的患者。
J Biomed Inform. 2017 Aug;72:77-84. doi: 10.1016/j.jbi.2017.06.016. Epub 2017 Jun 15.
2
Accuracy of claim data in the identification and classification of adults with congenital heart diseases in electronic medical records.电子病历中索赔数据对成人先天性心脏病的识别和分类的准确性。
Arch Cardiovasc Dis. 2019 Jan;112(1):31-43. doi: 10.1016/j.acvd.2018.07.002. Epub 2019 Jan 3.
3
Evaluation of an Algorithm for Identifying Ocular Conditions in Electronic Health Record Data.评估一种在电子健康记录数据中识别眼部疾病的算法。
JAMA Ophthalmol. 2019 May 1;137(5):491-497. doi: 10.1001/jamaophthalmol.2018.7051.
4
Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record.基于规则和机器学习算法可在电子健康记录中准确识别系统性硬化症患者。
Arthritis Res Ther. 2019 Dec 30;21(1):305. doi: 10.1186/s13075-019-2092-7.
5
Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.
6
Accuracy of phenotyping chronic rhinosinusitis in the electronic health record.电子健康记录中慢性鼻窦炎表型分型的准确性。
Am J Rhinol Allergy. 2014 Mar-Apr;28(2):140-4. doi: 10.2500/ajra.2014.28.4012.
7
Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance.结合电子健康记录中的计费代码、临床记录和药物信息可提供卓越的表型分析性能。
J Am Med Inform Assoc. 2016 Apr;23(e1):e20-7. doi: 10.1093/jamia/ocv130. Epub 2015 Sep 2.
8
Validation of methods for assessing cardiovascular disease using electronic health data in a cohort of Veterans with diabetes.在一组患有糖尿病的退伍军人中,使用电子健康数据评估心血管疾病方法的验证。
Pharmacoepidemiol Drug Saf. 2016 Apr;25(4):467-71. doi: 10.1002/pds.3921. Epub 2015 Nov 11.
9
Predicting coronary artery disease: a comparison between two data mining algorithms.预测冠状动脉疾病:两种数据挖掘算法的比较。
BMC Public Health. 2019 Apr 29;19(1):448. doi: 10.1186/s12889-019-6721-5.
10
Using text mining to extract depressive symptoms and to validate the diagnosis of major depressive disorder from electronic health records.利用文本挖掘技术从电子健康记录中提取抑郁症状并验证重性抑郁障碍的诊断。
J Affect Disord. 2020 Jan 1;260:617-623. doi: 10.1016/j.jad.2019.09.044. Epub 2019 Sep 11.

引用本文的文献

1
Cohort Identification from Free-Text Clinical Notes Using SNOMED CT's Hierarchical Semantic Relations.基于 SNOMED CT 层级语义关系的自由文本临床记录中的队列识别。
AMIA Annu Symp Proc. 2023 Apr 29;2022:349-358. eCollection 2022.
2
Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records.利用电子健康记录中的自然语言处理技术对主动脉瓣狭窄及其严重程度进行大规模识别。
Cardiovasc Digit Health J. 2021 Mar 18;2(3):156-163. doi: 10.1016/j.cvdhj.2021.03.003. eCollection 2021 Jun.
3
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models.
使用预训练语言模型从德语出院小结中自动提取12个心血管概念。
Digit Health. 2021 Nov 26;7:20552076211057662. doi: 10.1177/20552076211057662. eCollection 2021 Jan-Dec.
4
Cardiovascular Imaging and Intervention Through the Lens of Artificial Intelligence.透过人工智能视角看心血管成像与介入治疗
Interv Cardiol. 2021 Oct 20;16:e31. doi: 10.15420/icr.2020.04. eCollection 2021 Apr.
5
DES-Tcell is a knowledgebase for exploring immunology-related literature.DES-T 细胞是一个用于探索免疫学相关文献的知识库。
Sci Rep. 2021 Jul 12;11(1):14344. doi: 10.1038/s41598-021-93809-1.
6
Mortality Prediction of Patients With Cardiovascular Disease Using Medical Claims Data Under Artificial Intelligence Architectures: Validation Study.利用人工智能架构下的医疗理赔数据预测心血管疾病患者的死亡率:验证研究
JMIR Med Inform. 2021 Apr 1;9(4):e25000. doi: 10.2196/25000.
7
Cardiovascular informatics: building a bridge to data harmony.心血管信息学:构建通向数据和谐的桥梁。
Cardiovasc Res. 2022 Feb 21;118(3):732-745. doi: 10.1093/cvr/cvab067.
8
Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research.通过行政编码识别医院获得性静脉血栓栓塞的准确性:对大数据和机器学习研究的影响。
J Clin Monit Comput. 2022 Apr;36(2):397-405. doi: 10.1007/s10877-021-00664-6. Epub 2021 Feb 8.
9
Drug Abuse Research Trend Investigation with Text Mining.药物滥用研究趋势调查与文本挖掘。
Comput Math Methods Med. 2020 Feb 1;2020:1030815. doi: 10.1155/2020/1030815. eCollection 2020.
10
Geospatial Correlation of Amyopathic Dermatomyositis With Fixed Sources of Airborne Pollution: A Retrospective Cohort Study.无肌病性皮肌炎与空气中固定污染源的地理空间相关性:一项回顾性队列研究。
Front Med (Lausanne). 2019 Apr 24;6:85. doi: 10.3389/fmed.2019.00085. eCollection 2019.