使用专门的预处理管道对常规电子病历进行结直肠癌的预测建模。

Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records.

作者信息

Kop Reinier, Hoogendoorn Mark, Teije Annette Ten, Büchner Frederike L, Slottje Pauline, Moons Leon M G, Numans Mattijs E

机构信息

VU University Amsterdam, Department of Computer Science, Amsterdam, The Netherlands.

出版信息

Comput Biol Med. 2016 Sep 1;76:30-8. doi: 10.1016/j.compbiomed.2016.06.019. Epub 2016 Jun 22.

DOI:10.1016/j.compbiomed.2016.06.019

PMID:27392227

Abstract

Over the past years, research utilizing routine care data extracted from Electronic Medical Records (EMRs) has increased tremendously. Yet there are no straightforward, standardized strategies for pre-processing these data. We propose a dedicated medical pre-processing pipeline aimed at taking on many problems and opportunities contained within EMR data, such as their temporal, inaccurate and incomplete nature. The pipeline is demonstrated on a dataset of routinely recorded data in general practice EMRs of over 260,000 patients, in which the occurrence of colorectal cancer (CRC) is predicted using various machine learning techniques (i.e., CART, LR, RF) and subsets of the data. CRC is a common type of cancer, of which early detection has proven to be important yet challenging. The results are threefold. First, the predictive models generated using our pipeline reconfirmed known predictors and identified new, medically plausible, predictors derived from the cardiovascular and metabolic disease domain, validating the pipeline's effectiveness. Second, the difference between the best model generated by the data-driven subset (AUC 0.891) and the best model generated by the current state of the art hypothesis-driven subset (AUC 0.864) is statistically significant at the 95% confidence interval level. Third, the pipeline itself is highly generic and independent of the specific disease targeted and the EMR used. In conclusion, the application of established machine learning techniques in combination with the proposed pipeline on EMRs has great potential to enhance disease prediction, and hence early detection and intervention in medical practice.

摘要

在过去几年中，利用从电子病历（EMR）中提取的常规护理数据进行的研究大幅增加。然而，对于这些数据的预处理，尚无直接、标准化的策略。我们提出了一个专门的医学预处理流程，旨在解决EMR数据中存在的诸多问题并把握其中的机会，比如数据的时效性、不准确和不完整等特性。该流程在一个包含超过260,000名患者的全科医疗EMR常规记录数据的数据集上得到了验证，其中使用各种机器学习技术（即CART、LR、RF）和数据子集对结直肠癌（CRC）的发生情况进行了预测。CRC是一种常见的癌症类型，早期检测已被证明既重要又具有挑战性。结果有三个方面。首先，使用我们的流程生成的预测模型再次确认了已知的预测因素，并识别出了源自心血管和代谢疾病领域的新的、医学上合理的预测因素，验证了该流程的有效性。其次，数据驱动子集生成的最佳模型（AUC 0.891）与当前最先进的假设驱动子集生成的最佳模型（AUC 0.864）之间的差异在95%置信区间水平上具有统计学意义。第三，该流程本身具有高度通用性，与所针对的特定疾病和所使用的EMR无关。总之，将既定的机器学习技术与所提出的流程结合应用于EMR，在增强疾病预测方面具有巨大潜力，从而在医学实践中实现早期检测和干预。

相似文献

Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records.

Comput Biol Med. 2016 Sep 1;76:30-8. doi: 10.1016/j.compbiomed.2016.06.019. Epub 2016 Jun 22.

Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer.

Artif Intell Med. 2016 May;69:53-61. doi: 10.1016/j.artmed.2016.03.003. Epub 2016 Mar 31.

Automated data extraction and ensemble methods for predictive modeling of breast cancer outcomes after radiation therapy.

Med Phys. 2019 Feb;46(2):1054-1063. doi: 10.1002/mp.13314. Epub 2018 Dec 28.

An Interpretable Data-Driven Medical Knowledge Discovery Pipeline Based on Artificial Intelligence.

IEEE J Biomed Health Inform. 2023 Oct;27(10):5099-5109. doi: 10.1109/JBHI.2023.3299339. Epub 2023 Oct 5.

Healthcare pathway discovery and probabilistic machine learning.

Int J Med Inform. 2020 May;137:104087. doi: 10.1016/j.ijmedinf.2020.104087. Epub 2020 Feb 24.

Machine Learning for the Prediction of New-Onset Diabetes Mellitus during 5-Year Follow-up in Non-Diabetic Patients with Cardiovascular Risks.

Yonsei Med J. 2019 Feb;60(2):191-199. doi: 10.3349/ymj.2019.60.2.191.

Efficient Mining Template of Predictive Temporal Clinical Event Patterns From Patient Electronic Medical Records.

IEEE J Biomed Health Inform. 2019 Sep;23(5):2138-2147. doi: 10.1109/JBHI.2018.2877255. Epub 2018 Oct 22.

Early Prediction of Sepsis in EMR Records Using Traditional ML Techniques and Deep Learning LSTM Networks.

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:4038-4041. doi: 10.1109/EMBC.2018.8513254.

Automatic infection detection based on electronic medical records.

BMC Bioinformatics. 2018 Apr 11;19(Suppl 5):117. doi: 10.1186/s12859-018-2101-x.

Employing heat maps to mine associations in structured routine care data.

Artif Intell Med. 2014 Feb;60(2):79-88. doi: 10.1016/j.artmed.2013.12.003. Epub 2013 Dec 15.

引用本文的文献

Comprehensive application of artificial intelligence in colorectal cancer: A review.

iScience. 2025 Jun 23;28(7):112980. doi: 10.1016/j.isci.2025.112980. eCollection 2025 Jul 18.

A Novel Ensemble Framework for Comprehensive Early-Stage Colorectal Cancer Diagnosis, Prognosis, and Treatment: Integration of Gastroenterology-Specific Transformer Language Models and Multiple Decision Trees.

J Clin Med. 2025 Jun 23;14(13):4467. doi: 10.3390/jcm14134467.

An Order-Sensitive Hierarchical Neural Model for Early Lung Cancer Detection Using Dutch Primary Care Notes and Structured Data.

Cancers (Basel). 2025 Mar 29;17(7):1151. doi: 10.3390/cancers17071151.

Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review.

BMC Med Res Methodol. 2025 Jan 28;25(1):24. doi: 10.1186/s12874-025-02473-w.

Artificial Intelligence and the Future of Gastroenterology and Hepatology.

Gastro Hep Adv. 2022 May 11;1(4):581-595. doi: 10.1016/j.gastha.2022.02.025. eCollection 2022.

A stacking ensemble model for predicting the occurrence of carotid atherosclerosis.

Front Endocrinol (Lausanne). 2024 Jul 23;15:1390352. doi: 10.3389/fendo.2024.1390352. eCollection 2024.

The performance of FIT-based and other risk prediction models for colorectal neoplasia in symptomatic patients: a systematic review.

EClinicalMedicine. 2023 Sep 21;64:102204. doi: 10.1016/j.eclinm.2023.102204. eCollection 2023 Oct.

A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data.

PLoS One. 2023 Sep 8;18(9):e0274276. doi: 10.1371/journal.pone.0274276. eCollection 2023.

Early detection of colorectal cancer by leveraging Dutch primary care consultation notes with free text embeddings.

Sci Rep. 2023 Jul 4;13(1):10760. doi: 10.1038/s41598-023-37397-2.

Early identification of persistent somatic symptoms in primary care: data-driven and theory-driven predictive modelling based on electronic medical records of Dutch general practices.

BMJ Open. 2023 May 2;13(5):e066183. doi: 10.1136/bmjopen-2022-066183.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用专门的预处理管道对常规电子病历进行结直肠癌的预测建模。

Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records.

作者信息

Kop Reinier, Hoogendoorn Mark, Teije Annette Ten, Büchner Frederike L, Slottje Pauline, Moons Leon M G, Numans Mattijs E

机构信息

VU University Amsterdam, Department of Computer Science, Amsterdam, The Netherlands.

出版信息

Comput Biol Med. 2016 Sep 1;76:30-8. doi: 10.1016/j.compbiomed.2016.06.019. Epub 2016 Jun 22.

DOI:10.1016/j.compbiomed.2016.06.019

PMID:27392227

Abstract

摘要

使用专门的预处理管道对常规电子病历进行结直肠癌的预测建模。

Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用专门的预处理管道对常规电子病历进行结直肠癌的预测建模。

Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records.

作者信息

机构信息

出版信息

相似文献

引用本文的文献