Suppr超能文献

人工耳蜗候选者听力学数据缺失值的推断及其潜在应用。

Imputation of missing values for cochlear implant candidate audiometric data and potential applications.

机构信息

Department of Otolaryngology Head and Neck Surgery, Washington University School of Medicine, St. Louis, Missouri, United States of America.

Institute for Informatics, Washington University School of Medicine, St. Louis, Missouri, United States of America.

出版信息

PLoS One. 2023 Feb 6;18(2):e0281337. doi: 10.1371/journal.pone.0281337. eCollection 2023.

Abstract

OBJECTIVE

Assess the real-world performance of popular imputation algorithms on cochlear implant (CI) candidate audiometric data.

METHODS

7,451 audiograms from patients undergoing CI candidacy evaluation were pooled from 32 institutions with complete case analysis yielding 1,304 audiograms. Imputation model performance was assessed with nested cross-validation on randomly generated sparse datasets with various amounts of missing data, distributions of sparsity, and dataset sizes. A threshold for safe imputation was defined as root mean square error (RMSE) <10dB. Models included univariate imputation, interpolation, multiple imputation by chained equations (MICE), k-nearest neighbors, gradient boosted trees, and neural networks.

RESULTS

Greater quantities of missing data were associated with worse performance. Sparsity in audiometric data is not uniformly distributed, as inter-octave frequencies are less commonly tested. With 3-8 missing features per instance, a real-world sparsity distribution was associated with significantly better performance compared to other sparsity distributions (Δ RMSE 0.3 dB- 5.8 dB, non-overlapping 99% confidence intervals). With a real-world sparsity distribution, models were able to safely impute up to 6 missing datapoints in an 11-frequency audiogram. MICE consistently outperformed other models across all metrics and sparsity distributions (p < 0.01, Wilcoxon rank sum test). With sparsity capped at 6 missing features per audiogram but otherwise equivalent to the raw dataset, MICE imputed with RMSE of 7.83 dB [95% CI 7.81-7.86]. Imputing up to 6 missing features captures 99.3% of the audiograms in our dataset, allowing for a 5.7-fold increase in dataset size (1,304 to 7,399 audiograms) as compared with complete case analysis.

CONCLUSION

Precision medicine will inevitably play an integral role in the future of hearing healthcare. These methods are data dependent, and rigorously validated imputation models are a key tool for maximizing datasets. Using the largest CI audiogram dataset to-date, we demonstrate that in a real-world scenario MICE can safely impute missing data for the vast majority (>99%) of audiograms with RMSE well below a clinically significant threshold of 10dB. Evaluation across a range of dataset sizes and sparsity distributions suggests a high degree of generalizability to future applications.

摘要

目的

评估在人工耳蜗(CI)候选者听力数据中常用插补算法的实际表现。

方法

从 32 个机构中汇集了 7451 名接受 CI 候选评估的患者的听力图,采用完全病例分析,得到 1304 名听力图。使用嵌套交叉验证,对具有不同数量缺失数据、稀疏度分布和数据集大小的随机生成的稀疏数据集评估插补模型性能。将均方根误差(RMSE)<10dB 定义为安全插补的阈值。模型包括单变量插补、插值、链式方程多重插补(MICE)、k-最近邻、梯度提升树和神经网络。

结果

缺失数据量越大,性能越差。听力数据的稀疏度不是均匀分布的,因为倍频程之间的频率测试较少。对于每例缺失 3-8 个特征,实际稀疏度分布与其他稀疏度分布相比,性能显著提高(Δ RMSE 0.3-5.8dB,非重叠 99%置信区间)。在实际稀疏度分布下,模型能够安全地在 11 频听力图中插补多达 6 个缺失数据点。MICE 在所有指标和稀疏度分布上都优于其他模型(p<0.01,Wilcoxon 秩和检验)。在稀疏度限制为每例听力图缺失 6 个特征,而其他方面与原始数据集相同的情况下,MICE 插补的 RMSE 为 7.83dB[95%CI 7.81-7.86]。插补多达 6 个缺失特征可捕获我们数据集 99.3%的听力图,与完全病例分析相比,可将数据集大小增加 5.7 倍(从 1304 个增加到 7399 个听力图)。

结论

精准医学必将在听力保健的未来发挥不可或缺的作用。这些方法依赖于数据,严格验证的插补模型是最大化数据集的关键工具。使用迄今为止最大的 CI 听力图数据集,我们证明在实际情况下,MICE 可以安全地插补绝大多数(>99%)听力图的缺失数据,其 RMSE 远低于 10dB 的临床显著阈值。在一系列数据集大小和稀疏度分布的评估中,这些方法表现出高度的可推广性,适用于未来的应用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验