Suppr超能文献

理解大型数据集的局限性。

Understanding the limits of large datasets.

作者信息

Sanders Catherine M, Saltzstein Sidney L, Schultzel Matthew M, Nguyen Duy H, Stafford Helen Shi, Sadler Georgia Robins

机构信息

Rebecca and John Moores UCSD Cancer Center, University of California San Diego, La Jolla, CA 92093-0850, USA.

出版信息

J Cancer Educ. 2012 Dec;27(4):664-9. doi: 10.1007/s13187-012-0383-7.

Abstract

Many health professionals use large datasets to answer behavioral, translational, or clinical questions. Understanding the impact of missing data in large databases, such as disease registries, can avoid erroneous interpretations of these data. Using the California Cancer Registry, the authors selected seven common cancers, seven sociodemographic and clinical variables, and the top three reporting sources, as examples of the type of data that would be deemed critical to most studies. The gender variable had no missing data, followed by age (<0.1 % missing), ethnicity (1.7 %), stage (9.8 %), differentiation (39.1 %), and birthplace (41.1 %). Reports from hospitals and clinics had the lowest percentages of missing data. Users of large datasets should anticipate the limitations of missing data to prevent methodological flaws and misinterpretations of research findings. Knowledge of what and how much data may be missing in large datasets can help prevent errors in research conclusions, while better guiding treatment modalities and public health policies and programs.

摘要

许多医疗专业人员使用大型数据集来回答行为、转化或临床问题。了解大型数据库(如疾病登记系统)中缺失数据的影响,可避免对这些数据的错误解读。作者以加利福尼亚癌症登记系统为例,选取了七种常见癌症、七个社会人口统计学和临床变量以及三大报告来源,作为对大多数研究至关重要的数据类型示例。性别变量无缺失数据,其次是年龄(缺失率<0.1%)、种族(1.7%)、分期(9.8%)、分化程度(39.1%)和出生地(41.1%)。医院和诊所的报告缺失数据比例最低。大型数据集的用户应预见到缺失数据的局限性,以防止方法上的缺陷和对研究结果的错误解读。了解大型数据集中可能缺失哪些数据以及缺失多少数据,有助于防止研究结论出现错误,同时更好地指导治疗方式以及公共卫生政策和项目。

相似文献

1
Understanding the limits of large datasets.理解大型数据集的局限性。
J Cancer Educ. 2012 Dec;27(4):664-9. doi: 10.1007/s13187-012-0383-7.
3
Improving Hospital Reporting of Patient Race and Ethnicity--Approaches to Data Auditing.改善医院患者种族和族裔报告——数据审核方法
Health Serv Res. 2015 Aug;50 Suppl 1(Suppl 1):1372-89. doi: 10.1111/1475-6773.12324. Epub 2015 Jun 15.

引用本文的文献

2
Improving child health through Big Data and data science.通过大数据和数据科学改善儿童健康。
Pediatr Res. 2023 Jan;93(2):342-349. doi: 10.1038/s41390-022-02264-9. Epub 2022 Aug 16.
5
Artificial Intelligence in Adult Spinal Deformity.人工智能在成人脊柱畸形中的应用。
Acta Neurochir Suppl. 2022;134:313-318. doi: 10.1007/978-3-030-85292-4_35.
8
Evaluation of a Database for Tracking Cases of Child Sexual Abuse.用于追踪儿童性虐待案件的数据库评估
Psychiatr Psychol Law. 2017 Jun 1;24(6):951-957. doi: 10.1080/13218719.2017.1327314. eCollection 2017.

本文引用的文献

6
Late age (85 years or older) peak incidence of bladder cancer.膀胱癌的发病高峰出现在高龄(85岁及以上)阶段。
J Urol. 2008 Apr;179(4):1302-5; discussion 1305-6. doi: 10.1016/j.juro.2007.11.079. Epub 2008 Mar 4.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验