• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器学习应用于法国区域性癌症登记处前列腺腺癌病例的自动登记。

Machine learning application for incident prostate adenocarcinomas automatic registration in a French regional cancer registry.

机构信息

Public Health Department, Strasbourg University Hospital, 67000, Strasbourg, France.

Public Health Department, Strasbourg University Hospital, 67000, Strasbourg, France.

出版信息

Int J Med Inform. 2020 Jul;139:104139. doi: 10.1016/j.ijmedinf.2020.104139. Epub 2020 Apr 9.

DOI:10.1016/j.ijmedinf.2020.104139
PMID:32330852
Abstract

UNLABELLED

Cancer registries are collections of curated data about malignant tumor diseases. The amount of data processed by cancer registries increases every year, making manual registration more and more tedious.

OBJECTIVE

We sought to develop an automatic analysis pipeline that would be able to identify and preprocess registry input for incident prostate adenocarcinomas in a French regional cancer registry.

METHODS

Notifications from different sources submitted to the Bas-Rhin cancer registry were used here: pathology data and, ICD 10 diagnosis codes from hospital discharge data and healthcare insurance data. We trained a Support Vector Machine model (machine learning) to predict whether patient's data must be considered or not as a prostate adenocarcinoma incident case that should therefore be registered. The final registration of all identified cases was manually confirmed by a specialized technician. Text mining tools (regular expressions) were used to extract clinical and biological data from non-structured pathology reports.

RESULTS

We performed two successive analyses. First, we used 982 cases manually labeled by registrars from the 2014 dataset to predict the registration of 785 cases submitted in 2015. Then, we repeated the procedure using the 2089 cases labeled by registrars from the 2014 and 2015 datasets to predict the registration of 926 cases submitted in the 2016 data. The algorithm identified 663 cases of prostate adenocarcinoma in 2015, and 610 in 2016. From these findings, 663 and 531 cases were respectively added to the registry; and 641 and 512 cases were confirmed by the specialized technician. This registration process has achieved a precision level above 96 %. The algorithm obtained an overall precision of 99 % (99.5 % in 2015 and 98.5 % in 2016) and a recall of 97 % (97.8 % in 2015 and 96.9 % in 2016). When the information was found in pathology report, text mining was more than 90 % accuracy for major indicators: PSA test, Gleason score, and incidence date). For both PSA and tumor side, information was not detected in the majority of cases."

CONCLUSION

Machine learning was able to identify new cases of prostate cancer, and text mining was able to prefill the data about incident cases. Machine-learning-based automation of the registration process could reduce delays in data production and allow investigators to devote more time to complex tasks and analysis.

摘要

目的:我们旨在开发一种自动分析管道,以便能够识别和预处理法国地区癌症登记处中前列腺腺癌的登记输入。

方法:我们使用了来自下莱茵癌症登记处的不同来源的通知:病理数据以及来自医院出院数据和医疗保险数据的 ICD-10 诊断代码。我们训练了一个支持向量机模型(机器学习),以预测患者的数据是否必须被视为前列腺腺癌的新发病例,因此应进行登记。所有识别病例的最终登记均由专门的技术人员手动确认。文本挖掘工具(正则表达式)用于从非结构化病理报告中提取临床和生物学数据。

结果:我们进行了两次连续分析。首先,我们使用了 982 例由登记员手动标记的病例,来预测 2015 年提交的 785 例病例的登记情况。然后,我们使用了由登记员从 2014 年和 2015 年的数据集中标记的 2089 例病例来预测 2016 年提交的 926 例病例的登记情况。该算法在 2015 年识别了 663 例前列腺腺癌病例,在 2016 年识别了 610 例。从这些发现中,分别有 663 例和 531 例被添加到登记处,并且有 641 例和 512 例被专门的技术人员确认。该登记过程的准确率达到了 96%以上。该算法的整体准确率为 99%(2015 年为 99.5%,2016 年为 98.5%),召回率为 97%(2015 年为 97.8%,2016 年为 96.9%)。当在病理报告中找到信息时,文本挖掘在主要指标(PSA 测试、格里森评分和发病日期)上的准确率超过 90%。对于 PSA 和肿瘤侧,大多数病例都没有检测到信息。

结论:机器学习能够识别前列腺癌的新病例,并且文本挖掘能够预先填充关于新发病例的数据。基于机器学习的登记过程自动化可以减少数据生成的延迟,并使研究人员能够将更多的时间用于复杂任务和分析。

相似文献

1
Machine learning application for incident prostate adenocarcinomas automatic registration in a French regional cancer registry.机器学习应用于法国区域性癌症登记处前列腺腺癌病例的自动登记。
Int J Med Inform. 2020 Jul;139:104139. doi: 10.1016/j.ijmedinf.2020.104139. Epub 2020 Apr 9.
2
Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning.利用自然语言处理和机器学习有效识别国家规定的应报告癌症病例
J Am Med Inform Assoc. 2016 Nov;23(6):1077-1084. doi: 10.1093/jamia/ocw006. Epub 2016 Mar 28.
3
Automated selection of relevant information for notification of incident cancer cases within a multisource cancer registry.多源癌症登记处内自动筛选相关信息以通报癌症病例事件
Methods Inf Med. 2013;52(5):411-21. doi: 10.3414/ME12-01-0101. Epub 2013 Apr 24.
4
Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.用于信息提取降噪的手术病理报告的机器学习分类及语块识别
Artif Intell Med. 2016 Jun;70:77-83. doi: 10.1016/j.artmed.2016.06.001. Epub 2016 Jun 8.
5
Cross-registry neural domain adaptation to extract mutational test results from pathology reports.跨注册域神经域自适应从病理报告中提取突变测试结果。
J Biomed Inform. 2019 Sep;97:103267. doi: 10.1016/j.jbi.2019.103267. Epub 2019 Aug 8.
6
Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing.使用机器学习和自然语言处理实现缺血性中风亚型分类的自动化
J Stroke Cerebrovasc Dis. 2019 Jul;28(7):2045-2051. doi: 10.1016/j.jstrokecerebrovasdis.2019.02.004. Epub 2019 May 15.
7
Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning-Based Text-Mining Approach.意大利基于无编码急诊入院记录的儿科伤害监测:基于机器学习的文本挖掘方法。
JMIR Public Health Surveill. 2023 Jul 12;9:e44467. doi: 10.2196/44467.
8
Using text mining techniques to extract prostate cancer predictive information (Gleason score) from semi-structured narrative laboratory reports in the Gauteng province, South Africa.利用文本挖掘技术从南非豪登省半结构化叙述性实验室报告中提取前列腺癌预测信息(格里森评分)。
BMC Med Inform Decis Mak. 2021 Nov 25;21(1):330. doi: 10.1186/s12911-021-01697-2.
9
Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature.精准医学的文本挖掘:从生物医学文献中自动提取疾病-突变关系
J Am Med Inform Assoc. 2016 Jul;23(4):766-72. doi: 10.1093/jamia/ocw041. Epub 2016 Apr 27.
10
Automated classification of free-text pathology reports for registration of incident cases of cancer.用于癌症病例登记的自由文本病理报告的自动分类
Methods Inf Med. 2012;51(3):242-51. doi: 10.3414/ME11-01-0005. Epub 2011 Jul 26.

引用本文的文献

1
Automated, High-Throughput Platform to Generate a High-Reliability, Comprehensive Rectal Cancer Database.自动化、高通量平台,用于生成高可靠性、全面的直肠癌数据库。
JCO Clin Cancer Inform. 2024 May;8:e2300219. doi: 10.1200/CCI.23.00219.
2
Machine Learning Meets Cancer.机器学习与癌症相遇。
Cancers (Basel). 2024 Mar 8;16(6):1100. doi: 10.3390/cancers16061100.
3
Natural Language Processing in Diagnostic Texts from Nephropathology.肾脏病病理学诊断文本中的自然语言处理
Diagnostics (Basel). 2022 Jul 15;12(7):1726. doi: 10.3390/diagnostics12071726.
4
Machine Learning-Based Extraction of Breast Cancer Receptor Status From Bilingual Free-Text Pathology Reports.基于机器学习从双语自由文本病理报告中提取乳腺癌受体状态
Front Digit Health. 2021 Aug 17;3:692077. doi: 10.3389/fdgth.2021.692077. eCollection 2021.
5
Text Mining for Building Biomedical Networks Using Cancer as a Case Study.基于癌症案例研究的生物医学网络构建的文本挖掘。
Biomolecules. 2021 Sep 29;11(10):1430. doi: 10.3390/biom11101430.