• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用广义相加模型检测污染的出生日期。

Detecting contaminated birthdates using generalized additive models.

作者信息

Luo Wei, Gallagher Marcus, Loveday Bill, Ballantyne Susan, Connor Jason P, Wiles Janet

机构信息

Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong, Australia.

出版信息

BMC Bioinformatics. 2014 Jun 12;15:185. doi: 10.1186/1471-2105-15-185.

DOI:10.1186/1471-2105-15-185
PMID:24923281
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4065390/
Abstract

BACKGROUND

Erroneous patient birthdates are common in health databases. Detection of these errors usually involves manual verification, which can be resource intensive and impractical. By identifying a frequent manifestation of birthdate errors, this paper presents a principled and statistically driven procedure to identify erroneous patient birthdates.

RESULTS

Generalized additive models (GAM) enabled explicit incorporation of known demographic trends and birth patterns. With false positive rates controlled, the method identified birthdate contamination with high accuracy. In the health data set used, of the 58 actual incorrect birthdates manually identified by the domain expert, the GAM-based method identified 51, with 8 false positives (resulting in a positive predictive value of 86.0% (51/59) and a false negative rate of 12.0% (7/58)). These results outperformed linear time-series models.

CONCLUSIONS

The GAM-based method is an effective approach to identify systemic birthdate errors, a common data quality issue in both clinical and administrative databases, with high accuracy.

摘要

背景

在健康数据库中,患者出生日期错误很常见。检测这些错误通常需要人工核查,这可能耗费大量资源且不切实际。通过识别出生日期错误的常见表现形式,本文提出了一种基于原则且由统计驱动的程序来识别错误的患者出生日期。

结果

广义相加模型(GAM)能够明确纳入已知的人口统计学趋势和出生模式。在控制误报率的情况下,该方法能高精度地识别出生日期污染。在所使用的健康数据集中,领域专家手动识别出58个实际错误的出生日期,基于GAM的方法识别出51个,有8例假阳性(阳性预测值为86.0%(51/59),假阴性率为12.0%(7/58))。这些结果优于线性时间序列模型。

结论

基于GAM的方法是识别系统性出生日期错误的有效方法,这是临床和管理数据库中常见的数据质量问题,具有较高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/ad2d1f8dc5f9/1471-2105-15-185-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/b8029bfa7b56/1471-2105-15-185-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/76cc2ed6549a/1471-2105-15-185-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/957c0ef09a8d/1471-2105-15-185-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/75e507554163/1471-2105-15-185-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/4f359ceb532d/1471-2105-15-185-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/ad2d1f8dc5f9/1471-2105-15-185-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/b8029bfa7b56/1471-2105-15-185-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/76cc2ed6549a/1471-2105-15-185-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/957c0ef09a8d/1471-2105-15-185-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/75e507554163/1471-2105-15-185-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/4f359ceb532d/1471-2105-15-185-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/423c/4065390/ad2d1f8dc5f9/1471-2105-15-185-6.jpg

相似文献

1
Detecting contaminated birthdates using generalized additive models.使用广义相加模型检测污染的出生日期。
BMC Bioinformatics. 2014 Jun 12;15:185. doi: 10.1186/1471-2105-15-185.
2
Handling coarsened age information in the analysis of emergency department presentations.处理急诊科就诊分析中的年龄信息离散化问题。
BMC Med Res Methodol. 2020 Dec 7;20(1):297. doi: 10.1186/s12874-020-01181-x.
3
Striped Distribution Pattern of Purkinje Cells of Different Birthdates in the Mouse Cerebellar Cortex Studied with the Neurog2-CreER Transgenic Line.利用Neurog2-CreER转基因系研究小鼠小脑皮质中不同出生日期浦肯野细胞的条纹状分布模式。
Neuroscience. 2021 May 10;462:122-140. doi: 10.1016/j.neuroscience.2020.07.028. Epub 2020 Jul 25.
4
Anomalous distributions of birthdates across days of the month: An analysis using Spanish statistical records.
Popul Stud (Camb). 2025 Mar;79(1):167-185. doi: 10.1080/00324728.2024.2393622. Epub 2024 Oct 18.
5
Protocol for accuracy of point of care (POC) or in-office urine drug testing (immunoassay) in chronic pain patients: a prospective analysis of immunoassay and liquid chromatography tandem mass spectometry (LC/MS/MS).慢性疼痛患者即时检测(POC)或门诊尿液药物检测(免疫测定)准确性的方案:免疫测定和液相色谱串联质谱法(LC/MS/MS)的前瞻性分析。
Pain Physician. 2010 Jan-Feb;13(1):E1-E22.
6
An automated data verification approach for improving data quality in a clinical registry.一种自动化数据验证方法,用于提高临床注册中的数据质量。
Comput Methods Programs Biomed. 2019 Nov;181:104840. doi: 10.1016/j.cmpb.2019.01.012. Epub 2019 Jan 31.
7
Image feature analysis and computer-aided diagnosis in mammography: reduction of false-positive clustered microcalcifications using local edge-gradient analysis.
Med Phys. 1995 Feb;22(2):161-9. doi: 10.1118/1.597465.
8
Using a generalized additive model with autoregressive terms to study the effects of daily temperature on mortality.利用具有自回归项的广义加性模型研究日温度对死亡率的影响。
BMC Med Res Methodol. 2012 Oct 30;12:165. doi: 10.1186/1471-2288-12-165.
9
Limitations of pulmonary embolism ICD-10 codes in emergency department administrative data: let the buyer beware.急诊科管理数据中肺栓塞ICD - 10编码的局限性:买家需谨慎。
BMC Med Res Methodol. 2017 Jun 8;17(1):89. doi: 10.1186/s12874-017-0361-1.
10
Validity of cluster headache diagnoses in an electronic health record data repository.电子健康记录数据存储库中丛集性头痛诊断的有效性。
Headache. 2016 Jul;56(7):1132-6. doi: 10.1111/head.12850. Epub 2016 Jun 6.

引用本文的文献

1
Data cleaning process for HIV-indicator data extracted from DHIS2 national reporting system: a case study of Kenya.从 DHIS2 国家报告系统中提取的艾滋病毒指标数据的数据清理流程:肯尼亚案例研究。
BMC Med Inform Decis Mak. 2020 Nov 13;20(1):293. doi: 10.1186/s12911-020-01315-7.

本文引用的文献

1
The impact of economic recession on maternal and infant mortality: lessons from history.经济衰退对母婴死亡率的影响:历史的教训。
BMC Public Health. 2010 Nov 24;10:727. doi: 10.1186/1471-2458-10-727.
2
Do you know who's who in your EHR?
Healthc Financ Manage. 2009 Aug;63(8):68-73.
3
Where are the Sunday babies? II. Declining weekend birth rates in Switzerland.
Naturwissenschaften. 2008 Feb;95(2):161-4. doi: 10.1007/s00114-007-0305-4. Epub 2007 Sep 22.
4
Data cleaning: detecting, diagnosing, and editing data abnormalities.
数据清理:检测、诊断和编辑数据异常。
PLoS Med. 2005 Oct;2(10):e267. doi: 10.1371/journal.pmed.0020267. Epub 2005 Sep 6.
5
Human population: the next half century.人类人口:未来半个世纪。
Science. 2003 Nov 14;302(5648):1172-5. doi: 10.1126/science.1088665.
6
Defining and improving data quality in medical registries: a literature review, case study, and generic framework.界定与提升医学登记处的数据质量:文献综述、案例研究及通用框架
J Am Med Inform Assoc. 2002 Nov-Dec;9(6):600-11. doi: 10.1197/jamia.m1087.
7
Autologous donation error rates in Canada.加拿大自体献血的错误率。
Transfusion. 1997 May;37(5):523-7. doi: 10.1046/j.1537-2995.1997.37597293885.x.
8
Quality of prostate cancer data in the cancer registry of Norway.挪威癌症登记处前列腺癌数据的质量。
Eur J Cancer. 1996 Jan;32A(1):104-10. doi: 10.1016/0959-8049(95)00501-3.
9
Seasonality of births in human populations.人类种群中出生的季节性。
Soc Biol. 1991 Spring-Summer;38(1-2):51-78. doi: 10.1080/19485565.1991.9988772.