Department of Radiation Oncology (MAASTRO Clinic), Maastricht University Medical Centre (MUMC+), The Netherlands.
Radiother Oncol. 2013 Jul;108(1):174-9. doi: 10.1016/j.radonc.2012.09.019. Epub 2013 Feb 5.
Collecting trial data in a medical environment is at present mostly performed manually and therefore time-consuming, prone to errors and often incomplete with the complex data considered. Faster and more accurate methods are needed to improve the data quality and to shorten data collection times where information is often scattered over multiple data sources. The purpose of this study is to investigate the possible benefit of modern data warehouse technology in the radiation oncology field.
In this study, a Computer Aided Theragnostics (CAT) data warehouse combined with automated tools for feature extraction was benchmarked against the regular manual data-collection processes. Two sets of clinical parameters were compiled for non-small cell lung cancer (NSCLC) and rectal cancer, using 27 patients per disease. Data collection times and inconsistencies were compared between the manual and the automated extraction method.
The average time per case to collect the NSCLC data manually was 10.4 ± 2.1 min and 4.3 ± 1.1 min when using the automated method (p<0.001). For rectal cancer, these times were 13.5 ± 4.1 and 6.8 ± 2.4 min, respectively (p<0.001). In 3.2% of the data collected for NSCLC and 5.3% for rectal cancer, there was a discrepancy between the manual and automated method.
Aggregating multiple data sources in a data warehouse combined with tools for extraction of relevant parameters is beneficial for data collection times and offers the ability to improve data quality. The initial investments in digitizing the data are expected to be compensated due to the flexibility of the data analysis. Furthermore, successive investigations can easily select trial candidates and extract new parameters from the existing databases.
目前,在医学环境中收集试验数据主要是手动完成的,因此既费时又容易出错,而且通常不完整,因为所考虑的数据非常复杂。需要更快、更准确的方法来提高数据质量,并缩短信息经常分散在多个数据源的数据收集时间。本研究旨在探讨现代数据仓库技术在放射肿瘤学领域的可能益处。
在这项研究中,计算机辅助治疗(CAT)数据仓库与用于特征提取的自动化工具相结合,与常规的手动数据收集过程进行了基准测试。使用每个疾病 27 名患者,为非小细胞肺癌(NSCLC)和直肠癌编译了两组临床参数。比较了手动和自动提取方法之间的数据收集时间和不一致性。
手动收集 NSCLC 数据的平均每个病例时间为 10.4 ± 2.1 分钟,而使用自动方法时为 4.3 ± 1.1 分钟(p<0.001)。对于直肠癌,这些时间分别为 13.5 ± 4.1 分钟和 6.8 ± 2.4 分钟(p<0.001)。在 NSCLC 收集的数据中有 3.2%和直肠癌症数据中有 5.3%存在手动和自动方法之间的差异。
将多个数据源聚合到数据仓库中,并结合相关参数提取工具,有利于缩短数据收集时间,并提供提高数据质量的能力。由于数据分析的灵活性,预计对数据数字化的初始投资将得到补偿。此外,后续研究可以轻松地从现有数据库中选择试验候选者并提取新参数。