Anyimadu Eric Ababio, Fuller Clifton David, Zhang Xinhua, Elisabeta Marai G, Canahuate Guadalupe
Electrical and Computer Engineering, University of Iowa, Iowa City, IA 52242, USA.
Department of Radiation Oncology, The University of Texas, MD. Anderson Cancer Center, Houston, TX, USA.
Database Expert Syst Appl (2024). 2024 Aug;14910:231-248. doi: 10.1007/978-3-031-68309-1_20. Epub 2024 Aug 18.
This study addresses the prevalent issue of missing data in patient-reported outcome datasets, particularly focusing on head and neck cancer patient symptom ratings sourced from the MD Anderson Symptom Inventory. Given that many data mining and machine learning algorithms necessitate complete datasets, the accurate imputation of missing data as an initial step becomes crucial. In this study we propose, for the first time, the use of collaborative filtering for imputing missing head and neck cancer patient symptom ratings. Two configurations of collaborative filtering, namely patient-based and symptom-based, leverage known ratings to infer the missing ones. Additionally, this study compares the performance of collaborative filtering with alternative imputation methods such as Multiple Imputation by Chained Equations, Nearest Neighbor Imputation, and Linear interpolation. Performance is compared using Root Mean Squared Error and Mean Absolute Error metrics. Findings demonstrate that collaborative filtering is a viable and comparatively superior approach for imputing missing patient symptom data.
本研究探讨了患者报告结局数据集中普遍存在的数据缺失问题,特别关注源自MD安德森症状量表的头颈癌患者症状评分。鉴于许多数据挖掘和机器学习算法需要完整的数据集,作为第一步,准确插补缺失数据至关重要。在本研究中,我们首次提出使用协同过滤来插补头颈癌患者缺失的症状评分。协同过滤的两种配置,即基于患者和基于症状的配置,利用已知评分来推断缺失评分。此外,本研究将协同过滤的性能与其他插补方法进行了比较,如链式方程多重插补、最近邻插补和线性插值。使用均方根误差和平均绝对误差指标比较性能。研究结果表明,协同过滤是一种可行且相对优越的插补患者缺失症状数据的方法。