Suppr超能文献

使用广义相加模型检测污染的出生日期。

Detecting contaminated birthdates using generalized additive models.

作者信息

Luo Wei, Gallagher Marcus, Loveday Bill, Ballantyne Susan, Connor Jason P, Wiles Janet

机构信息

Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong, Australia.

出版信息

BMC Bioinformatics. 2014 Jun 12;15:185. doi: 10.1186/1471-2105-15-185.

Abstract

BACKGROUND

Erroneous patient birthdates are common in health databases. Detection of these errors usually involves manual verification, which can be resource intensive and impractical. By identifying a frequent manifestation of birthdate errors, this paper presents a principled and statistically driven procedure to identify erroneous patient birthdates.

RESULTS

Generalized additive models (GAM) enabled explicit incorporation of known demographic trends and birth patterns. With false positive rates controlled, the method identified birthdate contamination with high accuracy. In the health data set used, of the 58 actual incorrect birthdates manually identified by the domain expert, the GAM-based method identified 51, with 8 false positives (resulting in a positive predictive value of 86.0% (51/59) and a false negative rate of 12.0% (7/58)). These results outperformed linear time-series models.

CONCLUSIONS

The GAM-based method is an effective approach to identify systemic birthdate errors, a common data quality issue in both clinical and administrative databases, with high accuracy.

摘要

背景

在健康数据库中,患者出生日期错误很常见。检测这些错误通常需要人工核查,这可能耗费大量资源且不切实际。通过识别出生日期错误的常见表现形式,本文提出了一种基于原则且由统计驱动的程序来识别错误的患者出生日期。

结果

广义相加模型(GAM)能够明确纳入已知的人口统计学趋势和出生模式。在控制误报率的情况下,该方法能高精度地识别出生日期污染。在所使用的健康数据集中,领域专家手动识别出58个实际错误的出生日期,基于GAM的方法识别出51个,有8例假阳性(阳性预测值为86.0%(51/59),假阴性率为12.0%(7/58))。这些结果优于线性时间序列模型。

结论

基于GAM的方法是识别系统性出生日期错误的有效方法,这是临床和管理数据库中常见的数据质量问题,具有较高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/b8029bfa7b56/1471-2105-15-185-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验