• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用自然语言处理技术,从关联的癌症登记处和电子病历数据构建转移性乳腺癌队列。

Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data.

作者信息

Ling Albee Y, Kurian Allison W, Caswell-Jin Jennifer L, Sledge George W, Shah Nigam H, Tamang Suzanne R

机构信息

Biomedical Informatics Training Program, Stanford University, Stanford, CA.

Department of Biomedical Data Science, Stanford University, Stanford, CA.

出版信息

JAMIA Open. 2019 Sep 18;2(4):528-537. doi: 10.1093/jamiaopen/ooz040. eCollection 2019 Dec.

DOI:10.1093/jamiaopen/ooz040
PMID:32025650
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6994019/
Abstract

OBJECTIVES

Most population-based cancer databases lack information on metastatic recurrence. Electronic medical records (EMR) and cancer registries contain complementary information on cancer diagnosis, treatment and outcome, yet are rarely used synergistically. To construct a cohort of metastatic breast cancer (MBC) patients, we applied natural language processing techniques within a semisupervised machine learning framework to linked EMR-California Cancer Registry (CCR) data.

MATERIALS AND METHODS

We studied all female patients treated at Stanford Health Care with an incident breast cancer diagnosis from 2000 to 2014. Our database consisted of structured fields and unstructured free-text clinical notes from EMR, linked to CCR, a component of the Surveillance, Epidemiology and End Results Program (SEER). We identified MBC patients from CCR and extracted information on distant recurrences from patient notes in EMR. Furthermore, we trained a regularized logistic regression model for recurrent MBC classification and evaluated its performance on a gold standard set of 146 patients.

RESULTS

There were 11 459 breast cancer patients in total and the median follow-up time was 96.3 months. We identified 1886 MBC patients, 512 (27.1%) of whom were MBC patients and 1374 (72.9%) were recurrent MBC patients. Our final MBC classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.917, with sensitivity 0.861, specificity 0.878, and accuracy 0.870.

DISCUSSION AND CONCLUSION

To enable population-based research on MBC, we developed a framework for retrospective case detection combining EMR and CCR data. Our classifier achieved good AUC, sensitivity, and specificity without expert-labeled examples.

摘要

目的

大多数基于人群的癌症数据库缺乏关于转移性复发的信息。电子病历(EMR)和癌症登记处包含有关癌症诊断、治疗和结果的补充信息,但很少协同使用。为了构建一个转移性乳腺癌(MBC)患者队列,我们在半监督机器学习框架内应用自然语言处理技术来链接EMR-加利福尼亚癌症登记处(CCR)数据。

材料和方法

我们研究了2000年至2014年在斯坦福医疗保健机构接受首次乳腺癌诊断治疗的所有女性患者。我们的数据库由EMR中的结构化字段和非结构化自由文本临床记录组成,并与CCR(监测、流行病学和最终结果计划(SEER)的一个组成部分)相链接。我们从CCR中识别出MBC患者,并从EMR中的患者记录中提取远处复发的信息。此外,我们训练了一个用于复发性MBC分类的正则化逻辑回归模型,并在一组146例患者的金标准数据集上评估其性能。

结果

总共有11459例乳腺癌患者,中位随访时间为96.3个月。我们识别出1886例MBC患者,其中512例(27.1%)为初治MBC患者,1374例(72.9%)为复发性MBC患者。我们最终的MBC分类器在受试者工作特征曲线(AUC)下的面积为0.917,敏感性为0.861,特异性为0.878,准确性为0.870。

讨论与结论

为了开展基于人群的MBC研究,我们开发了一个结合EMR和CCR数据的回顾性病例检测框架。我们的分类器在没有专家标记示例的情况下取得了良好的AUC、敏感性和特异性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d3e/6994019/1fde333eb29e/ooz040f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d3e/6994019/142ce58122a7/ooz040f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d3e/6994019/1fde333eb29e/ooz040f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d3e/6994019/142ce58122a7/ooz040f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d3e/6994019/1fde333eb29e/ooz040f2.jpg

相似文献

1
Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data.利用自然语言处理技术,从关联的癌症登记处和电子病历数据构建转移性乳腺癌队列。
JAMIA Open. 2019 Sep 18;2(4):528-537. doi: 10.1093/jamiaopen/ooz040. eCollection 2019 Dec.
2
Natural Language Processing Approaches to Detect the Timeline of Metastatic Recurrence of Breast Cancer.用于检测乳腺癌转移复发时间线的自然语言处理方法
JCO Clin Cancer Inform. 2019 Oct;3:1-12. doi: 10.1200/CCI.19.00034.
3
Use of classifiers to optimise the identification and characterisation of metastatic breast cancer in a nationwide administrative registry.使用分类器优化全国性行政注册系统中转移性乳腺癌的识别和特征描述。
Acta Oncol. 2021 Dec;60(12):1604-1610. doi: 10.1080/0284186X.2021.1979645. Epub 2021 Sep 22.
4
De Novo Versus Recurrent HER2-Positive Metastatic Breast Cancer: Patient Characteristics, Treatment, and Survival from the SystHERs Registry.从 SystHERs 注册研究看新发性与复发性 HER2 阳性转移性乳腺癌:患者特征、治疗和生存。
Oncologist. 2020 Feb;25(2):e214-e222. doi: 10.1634/theoncologist.2019-0446. Epub 2019 Oct 14.
5
Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.自然语言处理和机器学习可实现从电子病历中自动提取和分类患者的吸烟状况。
Ups J Med Sci. 2020 Nov;125(4):316-324. doi: 10.1080/03009734.2020.1792010. Epub 2020 Jul 22.
6
Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms.基于机器学习算法的临床数据乳腺癌转移的分类和诊断预测。
Sci Rep. 2023 Jan 10;13(1):485. doi: 10.1038/s41598-023-27548-w.
7
Breast cancer treatment across health care systems: linking electronic medical records and state registry data to enable outcomes research.乳腺癌治疗在医疗保健系统中的应用:将电子病历和州级注册表数据相链接以支持成果研究。
Cancer. 2014 Jan 1;120(1):103-11. doi: 10.1002/cncr.28395. Epub 2013 Sep 24.
8
Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients.机器学习和自然语言处理(NLP)方法预测激素受体阳性(HR+)/HER2 阴性晚期乳腺癌患者一线治疗的早期进展。
Eur J Cancer. 2021 Feb;144:224-231. doi: 10.1016/j.ejca.2020.11.030. Epub 2020 Dec 26.
9
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.
10
Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study.利用临床记录的自然语言处理技术识别HIV感染者中的精神疾病和药物使用情况:回顾性队列研究
JMIR Med Inform. 2021 Mar 10;9(3):e23456. doi: 10.2196/23456.

引用本文的文献

1
Natural language processing for local, regional, and distant breast cancer relapse identification in pathology reports.用于在病理报告中识别局部、区域和远处乳腺癌复发的自然语言处理技术。
Breast Cancer Res Treat. 2025 Sep 2. doi: 10.1007/s10549-025-07801-8.
2
Using Electronic Health Records to Classify Cancer Site and Metastasis.利用电子健康记录对癌症部位和转移进行分类。
Appl Clin Inform. 2025 May;16(3):556-568. doi: 10.1055/a-2544-3117. Epub 2025 Jun 18.
3
Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing of Clinical Notes.

本文引用的文献

1
Change in Survival in Metastatic Breast Cancer with Treatment Advances: Meta-Analysis and Systematic Review.转移性乳腺癌治疗进展对生存率的影响:荟萃分析与系统评价
JNCI Cancer Spectr. 2018 Nov;2(4):pky062. doi: 10.1093/jncics/pky062. Epub 2018 Dec 24.
2
Enhanced Quality Measurement Event Detection: An Application to Physician Reporting.增强型质量测量事件检测:在医生报告中的应用
EGEMS (Wash DC). 2017 May 30;5(1):5. doi: 10.13063/2327-9214.1270.
3
Data Programming: Creating Large Training Sets, Quickly.数据编程:快速创建大型训练集。
利用临床记录的自然语言处理按初始诊断对立体定向放射外科患者进行分类。
JCO Clin Cancer Inform. 2025 Jun;9:e2400268. doi: 10.1200/CCI-24-00268. Epub 2025 Jun 13.
4
A population-based estimation of breast cancer recurrence in northeast Italy with administrative healthcare databases.利用行政医疗保健数据库对意大利东北部乳腺癌复发情况进行基于人群的估计。
Breast. 2025 May 1;82:104487. doi: 10.1016/j.breast.2025.104487.
5
Decoding Recurrence in Early-Stage and Locoregionally Advanced Non-Small Cell Lung Cancer: Insights From Electronic Health Records and Natural Language Processing.解读早期及局部晚期非小细胞肺癌的复发情况:来自电子健康记录和自然语言处理的见解
JCO Clin Cancer Inform. 2025 Apr;9:e2400227. doi: 10.1200/CCI-24-00227. Epub 2025 Apr 18.
6
Retrospective Case-Cohort Study on Risk Factors for Developing Distant Metastases in Women With Breast Cancer.乳腺癌女性发生远处转移危险因素的回顾性病例队列研究
Cancer Med. 2025 Apr;14(8):e70903. doi: 10.1002/cam4.70903.
7
Harnessing artificial intelligence for predicting breast cancer recurrence: a systematic review of clinical and imaging data.利用人工智能预测乳腺癌复发:对临床和影像数据的系统综述
Discov Oncol. 2025 Feb 8;16(1):135. doi: 10.1007/s12672-025-01908-6.
8
Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.使用自然语言处理技术在计算机断层扫描报告中自动识别乳腺癌复发情况
JCO Clin Cancer Inform. 2024 Dec;8:e2400107. doi: 10.1200/CCI.24.00107. Epub 2024 Dec 20.
9
An Exploration of the Utility and Impacts of Implementation Science Strategies by Cancer Registries for Healthcare Improvement: A Systematic Review.癌症登记处实施科学策略对医疗保健改善的效用与影响探索:一项系统综述
Int J Health Policy Manag. 2024;13:8297. doi: 10.34172/ijhpm.8297. Epub 2024 Oct 7.
10
Development and validation of a self-updating gout register from electronic health records data.基于电子健康记录数据的自我更新痛风登记册的开发与验证
RMD Open. 2024 Apr 24;10(2):e004120. doi: 10.1136/rmdopen-2024-004120.
Adv Neural Inf Process Syst. 2016 Dec;29:3567-3575.
4
Social factors matter in cancer risk and survivorship.社会因素在癌症风险和生存方面至关重要。
Cancer Causes Control. 2018 Jul;29(7):611-618. doi: 10.1007/s10552-018-1043-y. Epub 2018 May 30.
5
Association of Screening and Treatment With Breast Cancer Mortality by Molecular Subtype in US Women, 2000-2012.2000 - 2012年美国女性乳腺癌分子亚型的筛查、治疗与乳腺癌死亡率的关联
JAMA. 2018 Jan 9;319(2):154-164. doi: 10.1001/jama.2017.19130.
6
Estimation of the Number of Women Living with Metastatic Breast Cancer in the United States.美国转移性乳腺癌女性患者数量的估计。
Cancer Epidemiol Biomarkers Prev. 2017 Jun;26(6):809-815. doi: 10.1158/1055-9965.EPI-16-0889. Epub 2017 May 18.
7
Validation of Claims Algorithms for Progression to Metastatic Cancer in Patients with Breast, Non-small Cell Lung, and Colorectal Cancer.乳腺癌、非小细胞肺癌和结直肠癌患者进展为转移性癌症的索赔算法验证
Front Oncol. 2016 Feb 1;6:18. doi: 10.3389/fonc.2016.00018. eCollection 2016.
8
NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.NOBLE——用于大规模生物医学自然语言处理的灵活概念识别
BMC Bioinformatics. 2016 Jan 14;17:32. doi: 10.1186/s12859-015-0871-y.
9
Using Electronic Health Records for Population Health Research: A Review of Methods and Applications.利用电子健康记录进行人群健康研究:方法与应用综述。
Annu Rev Public Health. 2016;37:61-81. doi: 10.1146/annurev-publhealth-032315-021353. Epub 2015 Dec 11.
10
Intersection of Race/Ethnicity and Socioeconomic Status in Mortality After Breast Cancer.乳腺癌患者死亡中种族/族裔与社会经济地位的交叉情况
J Community Health. 2015 Dec;40(6):1287-99. doi: 10.1007/s10900-015-0052-y.