Department of Electrical Engineering ESAT, STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium.
JMIR Med Inform. 2014 Oct 20;2(2):e28. doi: 10.2196/medinform.3251.
Using machine-learning techniques, clinical diagnostic model research extracts diagnostic models from patient data. Traditionally, patient data are often collected using electronic Case Report Form (eCRF) systems, while mathematical software is used for analyzing these data using machine-learning techniques. Due to the lack of integration between eCRF systems and mathematical software, extracting diagnostic models is a complex, error-prone process. Moreover, due to the complexity of this process, it is usually only performed once, after a predetermined number of data points have been collected, without insight into the predictive performance of the resulting models.
The objective of the study of Clinical Data Miner (CDM) software framework is to offer an eCRF system with integrated data preprocessing and machine-learning libraries, improving efficiency of the clinical diagnostic model research workflow, and to enable optimization of patient inclusion numbers through study performance monitoring.
The CDM software framework was developed using a test-driven development (TDD) approach, to ensure high software quality. Architecturally, CDM's design is split over a number of modules, to ensure future extendability.
The TDD approach has enabled us to deliver high software quality. CDM's eCRF Web interface is in active use by the studies of the International Endometrial Tumor Analysis consortium, with over 4000 enrolled patients, and more studies planned. Additionally, a derived user interface has been used in six separate interrater agreement studies. CDM's integrated data preprocessing and machine-learning libraries simplify some otherwise manual and error-prone steps in the clinical diagnostic model research workflow. Furthermore, CDM's libraries provide study coordinators with a method to monitor a study's predictive performance as patient inclusions increase.
To our knowledge, CDM is the only eCRF system integrating data preprocessing and machine-learning libraries. This integration improves the efficiency of the clinical diagnostic model research workflow. Moreover, by simplifying the generation of learning curves, CDM enables study coordinators to assess more accurately when data collection can be terminated, resulting in better models or lower patient recruitment costs.
使用机器学习技术,临床诊断模型研究从患者数据中提取诊断模型。传统上,患者数据通常使用电子病例报告表(eCRF)系统收集,而数学软件则用于使用机器学习技术分析这些数据。由于 eCRF 系统和数学软件之间缺乏集成,因此提取诊断模型是一个复杂且容易出错的过程。此外,由于该过程的复杂性,通常仅在收集了预定数量的数据点后进行一次,而无法深入了解所得到模型的预测性能。
临床数据挖掘(CDM)软件框架的研究旨在提供一个具有集成数据预处理和机器学习库的 eCRF 系统,以提高临床诊断模型研究工作流程的效率,并通过研究性能监测优化患者纳入数量。
CDM 软件框架采用测试驱动开发(TDD)方法开发,以确保软件质量。在架构上,CDM 的设计分为多个模块,以确保未来的可扩展性。
TDD 方法使我们能够提供高质量的软件。CDM 的 eCRF Web 界面正在被国际子宫内膜肿瘤分析联盟的研究积极使用,已有超过 4000 名患者入组,并且计划进行更多的研究。此外,衍生的用户界面已在六个单独的组内一致性研究中使用。CDM 的集成数据预处理和机器学习库简化了临床诊断模型研究工作流程中一些原本手动且容易出错的步骤。此外,CDM 的库为研究协调员提供了一种方法,可以随着患者纳入量的增加来监测研究的预测性能。
据我们所知,CDM 是唯一集成数据预处理和机器学习库的 eCRF 系统。这种集成提高了临床诊断模型研究工作流程的效率。此外,通过简化学习曲线的生成,CDM 使研究协调员能够更准确地评估何时可以终止数据收集,从而获得更好的模型或降低患者招募成本。