• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于 SAS 的自然语言处理算法识别原发性和复发性癌症。

Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm.

机构信息

Kaiser Permanente Southern California, Research and Evaluation, Pasadena, California, USA.

出版信息

J Am Med Inform Assoc. 2013 Mar-Apr;20(2):349-55. doi: 10.1136/amiajnl-2012-000928. Epub 2012 Jul 21.

DOI:10.1136/amiajnl-2012-000928
PMID:22822041
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3638182/
Abstract

OBJECTIVE

Significant limitations exist in the timely and complete identification of primary and recurrent cancers for clinical and epidemiologic research. A SAS-based coding, extraction, and nomenclature tool (SCENT) was developed to address this problem.

MATERIALS AND METHODS

SCENT employs hierarchical classification rules to identify and extract information from electronic pathology reports. Reports are analyzed and coded using a dictionary of clinical concepts and associated SNOMED codes. To assess the accuracy of SCENT, validation was conducted using manual review of pathology reports from a random sample of 400 breast and 400 prostate cancer patients diagnosed at Kaiser Permanente Southern California. Trained abstractors classified the malignancy status of each report.

RESULTS

Classifications of SCENT were highly concordant with those of abstractors, achieving κ of 0.96 and 0.95 in the breast and prostate cancer groups, respectively. SCENT identified 51 of 54 new primary and 60 of 61 recurrent cancer cases across both groups, with only three false positives in 792 true benign cases. Measures of sensitivity, specificity, positive predictive value, and negative predictive value exceeded 94% in both cancer groups.

DISCUSSION

Favorable validation results suggest that SCENT can be used to identify, extract, and code information from pathology report text. Consequently, SCENT has wide applicability in research and clinical care. Further assessment will be needed to validate performance with other clinical text sources, particularly those with greater linguistic variability.

CONCLUSION

SCENT is proof of concept for SAS-based natural language processing applications that can be easily shared between institutions and used to support clinical and epidemiologic research.

摘要

目的

在临床和流行病学研究中,及时、完整地识别原发性和复发性癌症存在显著的局限性。为了解决这个问题,开发了一个基于 SAS 的编码、提取和命名工具 (SCENT)。

材料与方法

SCENT 采用分层分类规则从电子病理学报告中识别和提取信息。报告使用临床概念词典和相关 SNOMED 代码进行分析和编码。为了评估 SCENT 的准确性,对来自 Kaiser Permanente Southern California 的 400 例乳腺癌和 400 例前列腺癌患者的随机样本的病理报告进行了手动验证。经过培训的摘要者对每份报告的恶性程度进行了分类。

结果

SCENT 的分类与摘要者的分类高度一致,在乳腺癌和前列腺癌组中,κ 值分别为 0.96 和 0.95。SCENT 在两组中分别识别了 51 例新发原发性和 60 例复发性癌症病例,在 792 例真正良性病例中仅出现 3 例假阳性。两组的敏感性、特异性、阳性预测值和阴性预测值均超过 94%。

讨论

有利的验证结果表明,SCENT 可用于从病理学报告文本中识别、提取和编码信息。因此,SCENT 在研究和临床护理中有广泛的适用性。需要进一步评估,以验证其在其他临床文本来源中的性能,特别是那些具有更大语言变异性的来源。

结论

SCENT 是基于 SAS 的自然语言处理应用程序的概念验证,可以在机构之间轻松共享,并用于支持临床和流行病学研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/201eda6c6109/amiajnl-2012-000928f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/a35609c11b81/amiajnl-2012-000928f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/24e189c6f77d/amiajnl-2012-000928f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/cda96b5032dd/amiajnl-2012-000928f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/201eda6c6109/amiajnl-2012-000928f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/a35609c11b81/amiajnl-2012-000928f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/24e189c6f77d/amiajnl-2012-000928f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/cda96b5032dd/amiajnl-2012-000928f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ec/3638182/201eda6c6109/amiajnl-2012-000928f04.jpg

相似文献

1
Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm.使用基于 SAS 的自然语言处理算法识别原发性和复发性癌症。
J Am Med Inform Assoc. 2013 Mar-Apr;20(2):349-55. doi: 10.1136/amiajnl-2012-000928. Epub 2012 Jul 21.
2
Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results.从电子病历中提取数据:评估前列腺活检结果的自然语言处理程序的验证
World J Urol. 2014 Feb;32(1):99-103. doi: 10.1007/s00345-013-1040-4. Epub 2013 Feb 17.
3
Using a statistical natural language Parser augmented with the UMLS specialist lexicon to assign SNOMED CT codes to anatomic sites and pathologic diagnoses in full text pathology reports.使用一个通过统一医学语言系统(UMLS)专业词典增强的统计自然语言解析器,为全文病理报告中的解剖部位和病理诊断分配SNOMED CT编码。
AMIA Annu Symp Proc. 2009 Nov 14;2009:386-90.
4
Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT:一种用于从医学叙述中映射短语概念的机器学习系统。
J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.
5
Extracting lung cancer staging descriptors from pathology reports: A generative language model approach.从病理报告中提取肺癌分期描述符:一种生成式语言模型方法。
J Biomed Inform. 2024 Sep;157:104720. doi: 10.1016/j.jbi.2024.104720. Epub 2024 Sep 2.
6
Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text.利用医疗记录文本的自然语言处理技术对转移性前列腺癌患者的疼痛进行纵向分析。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):898-905. doi: 10.1136/amiajnl-2012-001076. Epub 2012 Nov 9.
7
Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning.利用自然语言处理和机器学习有效识别国家规定的应报告癌症病例
J Am Med Inform Assoc. 2016 Nov;23(6):1077-1084. doi: 10.1093/jamia/ocw006. Epub 2016 Mar 28.
8
Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports.开发和验证一种从病理报告中提取乳腺癌临床和病理特征的自然语言处理算法。
JCO Clin Cancer Inform. 2024 Aug;8:e2400034. doi: 10.1200/CCI.24.00034.
9
Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study.用于宫颈癌和肛门癌及癌前病变监测的自然语言处理:算法开发与分割验证研究
JMIR Med Inform. 2020 Nov 3;8(11):e20826. doi: 10.2196/20826.
10
Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records.利用电子健康记录纳入自然语言处理以改善轴性脊柱关节炎的分类。
Rheumatology (Oxford). 2020 May 1;59(5):1059-1065. doi: 10.1093/rheumatology/kez375.

引用本文的文献

1
Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.使用自然语言处理技术在计算机断层扫描报告中自动识别乳腺癌复发情况
JCO Clin Cancer Inform. 2024 Dec;8:e2400107. doi: 10.1200/CCI.24.00107. Epub 2024 Dec 20.
2
Development of an Automatic Rule-Based Algorithm for the Detection of Ovarian Cancer Recurrence From Electronic Health Records.基于规则的自动算法在电子病历中卵巢癌复发检测的开发。
JCO Clin Cancer Inform. 2024 Mar;8:e2300150. doi: 10.1200/CCI.23.00150.
3
Extracting cancer concepts from clinical notes using natural language processing: a systematic review.

本文引用的文献

1
Electronic health record systems and intent to apply for meaningful use incentives among office-based physician practices: United States, 2001-2011.电子健康记录系统以及基层医疗医生诊所申请有意义使用激励措施的意向:美国,2001 - 2011年
NCHS Data Brief. 2011 Nov(79):1-8.
2
Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.克服临床文本自然语言处理的障碍:共享任务的作用及对其他创造性解决方案的需求。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3. doi: 10.1136/amiajnl-2011-000465.
3
MITRE system for clinical assertion status classification.
使用自然语言处理从临床笔记中提取癌症概念:系统评价。
BMC Bioinformatics. 2023 Oct 29;24(1):405. doi: 10.1186/s12859-023-05480-0.
4
Deep learning approach to detection of colonoscopic information from unstructured reports.深度学习方法从非结构化报告中检测结肠镜信息。
BMC Med Inform Decis Mak. 2023 Feb 7;23(1):28. doi: 10.1186/s12911-023-02121-7.
5
An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports.一种用于从病理报告中提取诊断数据的便捷、高效且准确的自然语言处理方法。
J Pathol Inform. 2022 Nov 8;13:100154. doi: 10.1016/j.jpi.2022.100154. eCollection 2022.
6
Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.通过癌症自然语言处理的范围综述评估癌症研究和患者护理的电子健康记录。
JCO Clin Cancer Inform. 2022 Jul;6:e2200006. doi: 10.1200/CCI.22.00006.
7
Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system.自动化医疗图表审查在乳腺癌结局研究中的应用:一种新颖的自然语言处理提取系统。
BMC Med Res Methodol. 2022 May 12;22(1):136. doi: 10.1186/s12874-022-01583-z.
8
A scholarly network of AI research with an information science focus: Global North and Global South perspectives.一个以信息科学为重点的人工智能研究学术网络:全球北方和全球南方的视角。
PLoS One. 2022 Apr 15;17(4):e0266565. doi: 10.1371/journal.pone.0266565. eCollection 2022.
9
Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio-Canary comorbidity project.自然语言处理在评估心血管疾病合并症中的应用:cardio-Canary 合并症项目。
Clin Cardiol. 2021 Sep;44(9):1296-1304. doi: 10.1002/clc.23687. Epub 2021 Aug 4.
10
Electronic Health Record (EHR) Abstraction.电子健康记录(EHR)提取
Perspect Health Inf Manag. 2021 Mar 15;18(Spring):1g. eCollection 2021 Spring.
MITRE 临床断言状态分类系统。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):563-7. doi: 10.1136/amiajnl-2011-000164. Epub 2011 Apr 22.
4
Comparing methods for identifying pancreatic cancer patients using electronic data sources.比较使用电子数据源识别胰腺癌患者的方法。
AMIA Annu Symp Proc. 2010 Nov 13;2010:237-41.
5
Natural language processing for the development of a clinical registry: a validation study in intraductal papillary mucinous neoplasms.自然语言处理在临床注册中的应用:一项关于导管内乳头状黏液性肿瘤的验证研究。
HPB (Oxford). 2010 Dec;12(10):688-95. doi: 10.1111/j.1477-2574.2010.00235.x.
6
Health information technology: revisions to initial set of standards, implementation specifications, and certification criteria for electronic health record technology. Interim final rule with request for comments.健康信息技术:电子健康记录技术初始标准集、实施规范及认证标准的修订。征求意见的暂行最终规则。
Fed Regist. 2010 Oct 13;75(197):62686-90.
7
Description of a rule-based system for the i2b2 challenge in natural language processing for clinical data.用于临床数据自然语言处理中i2b2挑战的基于规则系统的描述。
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):571-5. doi: 10.1197/jamia.M3083. Epub 2009 Apr 23.
8
A rule-based approach for identifying obesity and its comorbidities in medical discharge summaries.一种基于规则的方法,用于在出院小结中识别肥胖及其合并症。
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):576-9. doi: 10.1197/jamia.M3086. Epub 2009 Apr 23.
9
Natural language processing framework to assess clinical conditions.用于评估临床状况的自然语言处理框架。
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):585-9. doi: 10.1197/jamia.M3091. Epub 2009 Apr 23.
10
Machine learning and rule-based approaches to assertion classification.用于断言分类的机器学习和基于规则的方法。
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):109-15. doi: 10.1197/jamia.M2950. Epub 2008 Oct 24.