面向2014年i2b2/德克萨斯大学健康科学中心心脏危险因素挑战赛的敏捷文本挖掘

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

作者信息

Cormack James, Nath Chinmoy, Milward David, Raja Kalpana, Jonnalagadda Siddhartha R

机构信息

Linguamatics Ltd., 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, UK.

Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 N. Lake Shore Drive, 11th Floor, Chicago, IL 60611, USA.

出版信息

J Biomed Inform. 2015 Dec;58 Suppl(0):S120-S127. doi: 10.1016/j.jbi.2015.06.030. Epub 2015 Jul 22.

DOI:10.1016/j.jbi.2015.06.030

PMID:26209007

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4737484/

Abstract

This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.

摘要

本文描述了使用一个敏捷文本挖掘平台（Linguamatics的交互式信息提取平台I2E）来提取患者记录中符合i2b2/UTHealth 2014挑战赛定义的文档级心脏危险因素。该方法采用数据驱动的基于规则的方法，并添加了一个简单的监督分类器。我们证明，敏捷文本挖掘能够快速优化提取策略，而后处理可以利用注释指南、语料库统计信息以及从金标准数据推断出的逻辑。我们还展示了训练集中的数据不平衡如何影响性能。在测试数据上对该方法进行评估，得到的F值为91.7%，比表现最佳的系统落后1%。

相似文献

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

J Biomed Inform. 2015 Dec;58 Suppl(0):S120-S127. doi: 10.1016/j.jbi.2015.06.030. Epub 2015 Jul 22.

Using local lexicalized rules to identify heart disease risk factors in clinical notes.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S183-S188. doi: 10.1016/j.jbi.2015.06.013. Epub 2015 Jun 29.

Combining glass box and black box evaluations in the identification of heart disease risk factors and their temporal relations from clinical records.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S133-S142. doi: 10.1016/j.jbi.2015.06.014. Epub 2015 Jul 2.

Risk factor detection for heart disease by applying text analytics in electronic medical records.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S164-S170. doi: 10.1016/j.jbi.2015.08.011. Epub 2015 Aug 14.

Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S143-S149. doi: 10.1016/j.jbi.2015.08.009. Epub 2015 Aug 21.

The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S111-S119. doi: 10.1016/j.jbi.2015.06.010. Epub 2015 Jun 26.

Coronary artery disease risk assessment from unstructured electronic health records using text mining.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S203-S210. doi: 10.1016/j.jbi.2015.08.003. Epub 2015 Aug 28.

Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S128-S132. doi: 10.1016/j.jbi.2015.08.002. Epub 2015 Aug 28.

Annotating risk factors for heart disease in clinical narratives for diabetic patients.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S78-S91. doi: 10.1016/j.jbi.2015.05.009. Epub 2015 May 21.

An automatic system to identify heart disease risk factors in clinical texts over time.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S158-S163. doi: 10.1016/j.jbi.2015.09.002. Epub 2015 Sep 8.

引用本文的文献

Text mining for case report articles on "peritoneal dialysis" from PubMed database.

Ther Apher Dial. 2025 Jun;29(3):459-470. doi: 10.1111/1744-9987.70013. Epub 2025 Mar 26.

Clinical concept annotation with contextual word embedding in active transfer learning environment.

Digit Health. 2024 Dec 19;10:20552076241308987. doi: 10.1177/20552076241308987. eCollection 2024 Jan-Dec.

Named Entity Recognition in Electronic Health Records: A Methodological Review.

Healthc Inform Res. 2023 Oct;29(4):286-300. doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31.

Developing Clinical Risk Prediction Models for Worsening Heart Failure Events and Death by Left Ventricular Ejection Fraction.

J Am Heart Assoc. 2023 Oct 3;12(19):e029736. doi: 10.1161/JAHA.122.029736. Epub 2023 Sep 30.

Large-scale identification of undiagnosed hepatic steatosis using natural language processing.

EClinicalMedicine. 2023 Aug 9;62:102149. doi: 10.1016/j.eclinm.2023.102149. eCollection 2023 Aug.

Heart disease risk factors detection from electronic health records using advanced NLP and deep learning techniques.

Sci Rep. 2023 May 3;13(1):7173. doi: 10.1038/s41598-023-34294-6.

Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) in the United States: Development and Validation of a Natural Language Processing Method.

JMIR Public Health Surveill. 2022 May 24;8(5):e30426. doi: 10.2196/30426.

A Natural Language Processing-Based Approach for Identifying Hospitalizations for Worsening Heart Failure Within an Integrated Health Care Delivery System.

JAMA Netw Open. 2021 Nov 1;4(11):e2135152. doi: 10.1001/jamanetworkopen.2021.35152.

Nd:YAG capsulotomy incidence associated with five different single-piece monofocal intraocular lenses: a 3-year Spanish real-world evidence study of 8293 eyes.

Eye (Lond). 2022 Nov;36(11):2205-2210. doi: 10.1038/s41433-021-01828-z. Epub 2021 Nov 11.

Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review.

J Diabetes Sci Technol. 2021 May;15(3):553-560. doi: 10.1177/19322968211000831. Epub 2021 Mar 19.

本文引用的文献

Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S67-S77. doi: 10.1016/j.jbi.2015.07.001. Epub 2015 Jul 22.

Annotating risk factors for heart disease in clinical narratives for diabetic patients.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S78-S91. doi: 10.1016/j.jbi.2015.05.009. Epub 2015 May 21.

A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions.

Database (Oxford). 2013 Nov 28;2013:bat080. doi: 10.1093/database/bat080. Print 2013.

Automated identification of pneumonia in chest radiograph reports in critically ill patients.

BMC Med Inform Decis Mak. 2013 Aug 15;13:90. doi: 10.1186/1472-6947-13-90.

Part-of-speech tagging for clinical text: wall or bridge between institutions?

AMIA Annu Symp Proc. 2011;2011:382-91. Epub 2011 Oct 22.

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.

Extracting medication information from clinical text.

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.

The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data.

J Am Med Inform Assoc. 2010 Mar-Apr;17(2):131-5. doi: 10.1136/jamia.2009.002691.

Description of a rule-based system for the i2b2 challenge in natural language processing for clinical data.

J Am Med Inform Assoc. 2009 Jul-Aug;16(4):571-5. doi: 10.1197/jamia.M3083. Epub 2009 Apr 23.

Ontology-based interactive information extraction from scientific abstracts.

Comp Funct Genomics. 2005;6(1-2):67-71. doi: 10.1002/cfg.456.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

面向2014年i2b2/德克萨斯大学健康科学中心心脏危险因素挑战赛的敏捷文本挖掘

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

作者信息

Cormack James, Nath Chinmoy, Milward David, Raja Kalpana, Jonnalagadda Siddhartha R

机构信息

Linguamatics Ltd., 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, UK.

Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 N. Lake Shore Drive, 11th Floor, Chicago, IL 60611, USA.

出版信息

J Biomed Inform. 2015 Dec;58 Suppl(0):S120-S127. doi: 10.1016/j.jbi.2015.06.030. Epub 2015 Jul 22.

DOI:10.1016/j.jbi.2015.06.030

PMID:26209007

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4737484/

Abstract

摘要

面向2014年i2b2/德克萨斯大学健康科学中心心脏危险因素挑战赛的敏捷文本挖掘

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

面向2014年i2b2/德克萨斯大学健康科学中心心脏危险因素挑战赛的敏捷文本挖掘

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

作者信息

机构信息

出版信息