一种从基于人群的癌症登记处的自由文本字段中获取详细治疗信息的文本挖掘方法：加利福尼亚州非小细胞肺癌的研究。

A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California.

机构信息

California Cancer Reporting and Epidemiologic Surveillance Program, Institute for Population Health Improvement, University of California Davis Health, Sacramento, California, United States of America.

University of California Davis, Graduate Group in Epidemiology, Davis, California, United States of America.

出版信息

PLoS One. 2019 Feb 22;14(2):e0212454. doi: 10.1371/journal.pone.0212454. eCollection 2019.

DOI:10.1371/journal.pone.0212454

PMID:30794610

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6386345/

Abstract

BACKGROUND

Population-based cancer registries have treatment information for all patients making them an excellent resource for population-level monitoring. However, specific treatment details, such as drug names, are contained in a free-text format that is difficult to process and summarize. We assessed the accuracy and efficiency of a text-mining algorithm to identify systemic treatments for lung cancer from free-text fields in the California Cancer Registry.

METHODS

The algorithm used Perl regular expressions in SAS 9.4 to search for treatments in 24,845 free-text records associated with 17,310 patients in California diagnosed with stage IV non-small cell lung cancer between 2012 and 2014. Our algorithm categorized treatments into six groups that align with National Comprehensive Cancer Network guidelines. We compared results to a manual review (gold standard) of the same records.

RESULTS

Percent agreement ranged from 91.1% to 99.4%. Ranges for other measures were 0.71-0.92 (Kappa), 74.3%-97.3% (sensitivity), 92.4%-99.8% (specificity), 60.4%-96.4% (positive predictive value), and 92.9%-99.9% (negative predictive value). The text-mining algorithm used one-sixth of the time required for manual review.

CONCLUSION

SAS-based text mining of free-text data can accurately detect systemic treatments administered to patients and save considerable time compared to manual review, maximizing the utility of the extant information in population-based cancer registries for comparative effectiveness research.

摘要

背景

基于人群的癌症登记处拥有所有患者的治疗信息，使其成为人群水平监测的绝佳资源。然而，特定的治疗细节，如药物名称，包含在难以处理和总结的自由文本格式中。我们评估了一种文本挖掘算法的准确性和效率，该算法用于从加利福尼亚癌症登记处的自由文本字段中识别肺癌的全身治疗方法。

方法

该算法使用 SAS 9.4 中的 Perl 正则表达式，在加利福尼亚州 2012 年至 2014 年间诊断为 IV 期非小细胞肺癌的 17,310 名患者的 24,845 份自由文本记录中搜索治疗方法。我们的算法将治疗方法分为与国家综合癌症网络指南一致的六组。我们将结果与对同一记录的手动审查（黄金标准）进行了比较。

结果

百分比一致性范围从 91.1%到 99.4%。其他指标的范围为 0.71-0.92（kappa）、74.3%-97.3%（灵敏度）、92.4%-99.8%（特异性）、60.4%-96.4%（阳性预测值）和 92.9%-99.9%（阴性预测值）。与手动审查相比，文本挖掘算法的使用时间仅为手动审查所需时间的六分之一。

结论

基于 SAS 的自由文本数据的文本挖掘可以准确地检测到患者接受的全身治疗方法，并与手动审查相比节省大量时间，从而最大限度地提高基于人群的癌症登记处现有信息在比较有效性研究中的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a806/6386345/cfd4448b0850/pone.0212454.g001.jpg

相似文献

A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California.

PLoS One. 2019 Feb 22;14(2):e0212454. doi: 10.1371/journal.pone.0212454. eCollection 2019.

Development and Portability of a Text Mining Algorithm for Capturing Disease Progression in Electronic Health Records of Patients With Stage IV Non-Small Cell Lung Cancer.

JCO Clin Cancer Inform. 2024 Oct;8:e2400053. doi: 10.1200/CCI.24.00053. Epub 2024 Oct 4.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Automated detection of follow-up appointments using text mining of discharge records.

Int J Qual Health Care. 2010 Jun;22(3):229-35. doi: 10.1093/intqhc/mzq012. Epub 2010 Mar 27.

Text-mining in electronic healthcare records can be used as efficient tool for screening and data collection in cardiovascular trials: a multicenter validation study.

J Clin Epidemiol. 2021 Apr;132:97-105. doi: 10.1016/j.jclinepi.2020.11.014. Epub 2020 Nov 25.

Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records.

Am J Obstet Gynecol. 2018 Jun;218(6):610.e1-610.e7. doi: 10.1016/j.ajog.2018.02.002. Epub 2018 Feb 9.

Approaches to text mining for analyzing treatment plan of quit smoking with free-text medical records: A PRISMA-compliant meta-analysis.

Medicine (Baltimore). 2020 Jul 17;99(29):e20999. doi: 10.1097/MD.0000000000020999.

Developing a Surgical Site Infection Surveillance System Based on Hospital Unstructured Clinical Notes and Text Mining.

Surg Infect (Larchmt). 2020 Oct;21(8):716-721. doi: 10.1089/sur.2019.238. Epub 2020 Feb 27.

Creation and Validation of an Automated Algorithm to Determine Postoperative Ventilator Requirements After Cardiac Surgery.

Anesth Analg. 2017 May;124(5):1423-1430. doi: 10.1213/ANE.0000000000001997.

Novel Method to Flag Cardiac Implantable Device Infections by Integrating Text Mining With Structured Data in the Veterans Health Administration's Electronic Medical Record.

JAMA Netw Open. 2020 Sep 1;3(9):e2012264. doi: 10.1001/jamanetworkopen.2020.12264.

引用本文的文献

Comparative assessment of manual chart review and ICD claims data in evaluating immunotherapy-related adverse events.

Cancer Immunol Immunother. 2021 Oct;70(10):2761-2769. doi: 10.1007/s00262-021-02880-0. Epub 2021 Feb 24.

Cancer Informatics in 2019: Deep Learning Takes Center Stage.

Yearb Med Inform. 2020 Aug;29(1):243-246. doi: 10.1055/s-0040-1701993. Epub 2020 Aug 21.

本文引用的文献

Real-World Treatment Patterns, Overall Survival, and Occurrence and Costs of Adverse Events Associated With First-line Therapies for Medicare Patients 65 Years and Older With Advanced Non-small-cell Lung Cancer: A Retrospective Study.

Clin Lung Cancer. 2018 Sep;19(5):e629-e645. doi: 10.1016/j.cllc.2018.04.017. Epub 2018 May 7.

Development and Validation of a Natural Language Processing Tool to Identify Patients Treated for Pneumonia across VA Emergency Departments.

Appl Clin Inform. 2018 Jan;9(1):122-128. doi: 10.1055/s-0038-1626725. Epub 2018 Feb 21.

Natural language processing of clinical notes for identification of critical limb ischemia.

Int J Med Inform. 2018 Mar;111:83-89. doi: 10.1016/j.ijmedinf.2017.12.024. Epub 2017 Dec 28.

Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization.

J Pathol Inform. 2017 Dec 14;8:48. doi: 10.4103/jpi.jpi_55_17. eCollection 2017.

Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.

Real-world practice patterns for patients with advanced non-small cell lung cancer: multicenter retrospective cohort study in Japan.

Lung Cancer (Auckl). 2017 Oct 24;8:191-206. doi: 10.2147/LCTT.S140491. eCollection 2017.

Patterns of care for non-small cell lung cancer patients in Belgium: A population-based study.

Eur J Cancer Care (Engl). 2018 Jan;27(1). doi: 10.1111/ecc.12747. Epub 2017 Aug 18.

Real-world first-line treatment and overall survival in non-small cell lung cancer without known EGFR mutations or ALK rearrangements in US community oncology setting.

PLoS One. 2017 Jun 23;12(6):e0178420. doi: 10.1371/journal.pone.0178420. eCollection 2017.

Non-Small Cell Lung Cancer, Version 5.2017, NCCN Clinical Practice Guidelines in Oncology.

J Natl Compr Canc Netw. 2017 Apr;15(4):504-535. doi: 10.6004/jnccn.2017.0050.

Treatment Patterns and Overall Survival Associated with First-Line Systemic Therapy for Patients with Advanced Non-Small Cell Lung Cancer.

J Manag Care Spec Pharm. 2017 Feb;23(2):195-205. doi: 10.18553/jmcp.2017.23.2.195.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种从基于人群的癌症登记处的自由文本字段中获取详细治疗信息的文本挖掘方法：加利福尼亚州非小细胞肺癌的研究。

A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献