Suppr超能文献

一种从基于人群的癌症登记处的自由文本字段中获取详细治疗信息的文本挖掘方法:加利福尼亚州非小细胞肺癌的研究。

A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California.

机构信息

California Cancer Reporting and Epidemiologic Surveillance Program, Institute for Population Health Improvement, University of California Davis Health, Sacramento, California, United States of America.

University of California Davis, Graduate Group in Epidemiology, Davis, California, United States of America.

出版信息

PLoS One. 2019 Feb 22;14(2):e0212454. doi: 10.1371/journal.pone.0212454. eCollection 2019.

Abstract

BACKGROUND

Population-based cancer registries have treatment information for all patients making them an excellent resource for population-level monitoring. However, specific treatment details, such as drug names, are contained in a free-text format that is difficult to process and summarize. We assessed the accuracy and efficiency of a text-mining algorithm to identify systemic treatments for lung cancer from free-text fields in the California Cancer Registry.

METHODS

The algorithm used Perl regular expressions in SAS 9.4 to search for treatments in 24,845 free-text records associated with 17,310 patients in California diagnosed with stage IV non-small cell lung cancer between 2012 and 2014. Our algorithm categorized treatments into six groups that align with National Comprehensive Cancer Network guidelines. We compared results to a manual review (gold standard) of the same records.

RESULTS

Percent agreement ranged from 91.1% to 99.4%. Ranges for other measures were 0.71-0.92 (Kappa), 74.3%-97.3% (sensitivity), 92.4%-99.8% (specificity), 60.4%-96.4% (positive predictive value), and 92.9%-99.9% (negative predictive value). The text-mining algorithm used one-sixth of the time required for manual review.

CONCLUSION

SAS-based text mining of free-text data can accurately detect systemic treatments administered to patients and save considerable time compared to manual review, maximizing the utility of the extant information in population-based cancer registries for comparative effectiveness research.

摘要

背景

基于人群的癌症登记处拥有所有患者的治疗信息,使其成为人群水平监测的绝佳资源。然而,特定的治疗细节,如药物名称,包含在难以处理和总结的自由文本格式中。我们评估了一种文本挖掘算法的准确性和效率,该算法用于从加利福尼亚癌症登记处的自由文本字段中识别肺癌的全身治疗方法。

方法

该算法使用 SAS 9.4 中的 Perl 正则表达式,在加利福尼亚州 2012 年至 2014 年间诊断为 IV 期非小细胞肺癌的 17,310 名患者的 24,845 份自由文本记录中搜索治疗方法。我们的算法将治疗方法分为与国家综合癌症网络指南一致的六组。我们将结果与对同一记录的手动审查(黄金标准)进行了比较。

结果

百分比一致性范围从 91.1%到 99.4%。其他指标的范围为 0.71-0.92(kappa)、74.3%-97.3%(灵敏度)、92.4%-99.8%(特异性)、60.4%-96.4%(阳性预测值)和 92.9%-99.9%(阴性预测值)。与手动审查相比,文本挖掘算法的使用时间仅为手动审查所需时间的六分之一。

结论

基于 SAS 的自由文本数据的文本挖掘可以准确地检测到患者接受的全身治疗方法,并与手动审查相比节省大量时间,从而最大限度地提高基于人群的癌症登记处现有信息在比较有效性研究中的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a806/6386345/cfd4448b0850/pone.0212454.g001.jpg

相似文献

4
Automated detection of follow-up appointments using text mining of discharge records.
Int J Qual Health Care. 2010 Jun;22(3):229-35. doi: 10.1093/intqhc/mzq012. Epub 2010 Mar 27.
6
Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records.
Am J Obstet Gynecol. 2018 Jun;218(6):610.e1-610.e7. doi: 10.1016/j.ajog.2018.02.002. Epub 2018 Feb 9.
8
Developing a Surgical Site Infection Surveillance System Based on Hospital Unstructured Clinical Notes and Text Mining.
Surg Infect (Larchmt). 2020 Oct;21(8):716-721. doi: 10.1089/sur.2019.238. Epub 2020 Feb 27.

引用本文的文献

1
Comparative assessment of manual chart review and ICD claims data in evaluating immunotherapy-related adverse events.
Cancer Immunol Immunother. 2021 Oct;70(10):2761-2769. doi: 10.1007/s00262-021-02880-0. Epub 2021 Feb 24.
2
Cancer Informatics in 2019: Deep Learning Takes Center Stage.
Yearb Med Inform. 2020 Aug;29(1):243-246. doi: 10.1055/s-0040-1701993. Epub 2020 Aug 21.

本文引用的文献

3
Natural language processing of clinical notes for identification of critical limb ischemia.
Int J Med Inform. 2018 Mar;111:83-89. doi: 10.1016/j.ijmedinf.2017.12.024. Epub 2017 Dec 28.
5
6
Real-world practice patterns for patients with advanced non-small cell lung cancer: multicenter retrospective cohort study in Japan.
Lung Cancer (Auckl). 2017 Oct 24;8:191-206. doi: 10.2147/LCTT.S140491. eCollection 2017.
7
Patterns of care for non-small cell lung cancer patients in Belgium: A population-based study.
Eur J Cancer Care (Engl). 2018 Jan;27(1). doi: 10.1111/ecc.12747. Epub 2017 Aug 18.
9
Non-Small Cell Lung Cancer, Version 5.2017, NCCN Clinical Practice Guidelines in Oncology.
J Natl Compr Canc Netw. 2017 Apr;15(4):504-535. doi: 10.6004/jnccn.2017.0050.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验