Suppr超能文献

使用基于 SAS 的自然语言处理算法识别原发性和复发性癌症。

Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm.

机构信息

Kaiser Permanente Southern California, Research and Evaluation, Pasadena, California, USA.

出版信息

J Am Med Inform Assoc. 2013 Mar-Apr;20(2):349-55. doi: 10.1136/amiajnl-2012-000928. Epub 2012 Jul 21.

Abstract

OBJECTIVE

Significant limitations exist in the timely and complete identification of primary and recurrent cancers for clinical and epidemiologic research. A SAS-based coding, extraction, and nomenclature tool (SCENT) was developed to address this problem.

MATERIALS AND METHODS

SCENT employs hierarchical classification rules to identify and extract information from electronic pathology reports. Reports are analyzed and coded using a dictionary of clinical concepts and associated SNOMED codes. To assess the accuracy of SCENT, validation was conducted using manual review of pathology reports from a random sample of 400 breast and 400 prostate cancer patients diagnosed at Kaiser Permanente Southern California. Trained abstractors classified the malignancy status of each report.

RESULTS

Classifications of SCENT were highly concordant with those of abstractors, achieving κ of 0.96 and 0.95 in the breast and prostate cancer groups, respectively. SCENT identified 51 of 54 new primary and 60 of 61 recurrent cancer cases across both groups, with only three false positives in 792 true benign cases. Measures of sensitivity, specificity, positive predictive value, and negative predictive value exceeded 94% in both cancer groups.

DISCUSSION

Favorable validation results suggest that SCENT can be used to identify, extract, and code information from pathology report text. Consequently, SCENT has wide applicability in research and clinical care. Further assessment will be needed to validate performance with other clinical text sources, particularly those with greater linguistic variability.

CONCLUSION

SCENT is proof of concept for SAS-based natural language processing applications that can be easily shared between institutions and used to support clinical and epidemiologic research.

摘要

目的

在临床和流行病学研究中,及时、完整地识别原发性和复发性癌症存在显著的局限性。为了解决这个问题,开发了一个基于 SAS 的编码、提取和命名工具 (SCENT)。

材料与方法

SCENT 采用分层分类规则从电子病理学报告中识别和提取信息。报告使用临床概念词典和相关 SNOMED 代码进行分析和编码。为了评估 SCENT 的准确性,对来自 Kaiser Permanente Southern California 的 400 例乳腺癌和 400 例前列腺癌患者的随机样本的病理报告进行了手动验证。经过培训的摘要者对每份报告的恶性程度进行了分类。

结果

SCENT 的分类与摘要者的分类高度一致,在乳腺癌和前列腺癌组中,κ 值分别为 0.96 和 0.95。SCENT 在两组中分别识别了 51 例新发原发性和 60 例复发性癌症病例,在 792 例真正良性病例中仅出现 3 例假阳性。两组的敏感性、特异性、阳性预测值和阴性预测值均超过 94%。

讨论

有利的验证结果表明,SCENT 可用于从病理学报告文本中识别、提取和编码信息。因此,SCENT 在研究和临床护理中有广泛的适用性。需要进一步评估,以验证其在其他临床文本来源中的性能,特别是那些具有更大语言变异性的来源。

结论

SCENT 是基于 SAS 的自然语言处理应用程序的概念验证,可以在机构之间轻松共享,并用于支持临床和流行病学研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/a35609c11b81/amiajnl-2012-000928f01.jpg

相似文献

1
Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm.
J Am Med Inform Assoc. 2013 Mar-Apr;20(2):349-55. doi: 10.1136/amiajnl-2012-000928. Epub 2012 Jul 21.
4
Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.
J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.
5
Extracting lung cancer staging descriptors from pathology reports: A generative language model approach.
J Biomed Inform. 2024 Sep;157:104720. doi: 10.1016/j.jbi.2024.104720. Epub 2024 Sep 2.
6
Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text.
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):898-905. doi: 10.1136/amiajnl-2012-001076. Epub 2012 Nov 9.
7
Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning.
J Am Med Inform Assoc. 2016 Nov;23(6):1077-1084. doi: 10.1093/jamia/ocw006. Epub 2016 Mar 28.
10

引用本文的文献

1
Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.
JCO Clin Cancer Inform. 2024 Dec;8:e2400107. doi: 10.1200/CCI.24.00107. Epub 2024 Dec 20.
3
Extracting cancer concepts from clinical notes using natural language processing: a systematic review.
BMC Bioinformatics. 2023 Oct 29;24(1):405. doi: 10.1186/s12859-023-05480-0.
4
Deep learning approach to detection of colonoscopic information from unstructured reports.
BMC Med Inform Decis Mak. 2023 Feb 7;23(1):28. doi: 10.1186/s12911-023-02121-7.
5
An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports.
J Pathol Inform. 2022 Nov 8;13:100154. doi: 10.1016/j.jpi.2022.100154. eCollection 2022.
8
A scholarly network of AI research with an information science focus: Global North and Global South perspectives.
PLoS One. 2022 Apr 15;17(4):e0266565. doi: 10.1371/journal.pone.0266565. eCollection 2022.
10
Electronic Health Record (EHR) Abstraction.
Perspect Health Inf Manag. 2021 Mar 15;18(Spring):1g. eCollection 2021 Spring.

本文引用的文献

2
Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3. doi: 10.1136/amiajnl-2011-000465.
3
MITRE system for clinical assertion status classification.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):563-7. doi: 10.1136/amiajnl-2011-000164. Epub 2011 Apr 22.
7
Description of a rule-based system for the i2b2 challenge in natural language processing for clinical data.
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):571-5. doi: 10.1197/jamia.M3083. Epub 2009 Apr 23.
8
A rule-based approach for identifying obesity and its comorbidities in medical discharge summaries.
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):576-9. doi: 10.1197/jamia.M3086. Epub 2009 Apr 23.
9
Natural language processing framework to assess clinical conditions.
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):585-9. doi: 10.1197/jamia.M3091. Epub 2009 Apr 23.
10
Machine learning and rule-based approaches to assertion classification.
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):109-15. doi: 10.1197/jamia.M2950. Epub 2008 Oct 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验