• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过筛选具有“完整数据”的患者的电子健康记录所引入的偏差。

Biases introduced by filtering electronic health records for patients with "complete data".

作者信息

Weber Griffin M, Adams William G, Bernstam Elmer V, Bickel Jonathan P, Fox Kathe P, Marsolo Keith, Raghavan Vijay A, Turchin Alexander, Zhou Xiaobo, Murphy Shawn N, Mandl Kenneth D

机构信息

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.

出版信息

J Am Med Inform Assoc. 2017 Nov 1;24(6):1134-1141. doi: 10.1093/jamia/ocx071.

DOI:10.1093/jamia/ocx071
PMID:29016972
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6080680/
Abstract

OBJECTIVE

One promise of nationwide adoption of electronic health records (EHRs) is the availability of data for large-scale clinical research studies. However, because the same patient could be treated at multiple health care institutions, data from only a single site might not contain the complete medical history for that patient, meaning that critical events could be missing. In this study, we evaluate how simple heuristic checks for data "completeness" affect the number of patients in the resulting cohort and introduce potential biases.

MATERIALS AND METHODS

We began with a set of 16 filters that check for the presence of demographics, laboratory tests, and other types of data, and then systematically applied all 216 possible combinations of these filters to the EHR data for 12 million patients at 7 health care systems and a separate payor claims database of 7 million members.

RESULTS

EHR data showed considerable variability in data completeness across sites and high correlation between data types. For example, the fraction of patients with diagnoses increased from 35.0% in all patients to 90.9% in those with at least 1 medication. An unrelated claims dataset independently showed that most filters select members who are older and more likely female and can eliminate large portions of the population whose data are actually complete.

DISCUSSION AND CONCLUSION

As investigators design studies, they need to balance their confidence in the completeness of the data with the effects of placing requirements on the data on the resulting patient cohort.

摘要

目的

在全国范围内采用电子健康记录(EHRs)的一个前景是可为大规模临床研究提供数据。然而,由于同一患者可能在多个医疗机构接受治疗,仅来自单一机构的数据可能不包含该患者的完整病史,这意味着关键事件可能缺失。在本研究中,我们评估了对数据“完整性”进行简单启发式检查如何影响最终队列中的患者数量,并引入潜在偏差。

材料与方法

我们从一组16个过滤器开始,这些过滤器用于检查人口统计学数据、实验室检查及其他类型数据的存在情况,然后系统地将这些过滤器的所有216种可能组合应用于7个医疗系统中1200万患者的电子健康记录数据以及一个包含700万成员的独立医保理赔数据库。

结果

电子健康记录数据显示,各机构之间的数据完整性存在显著差异,且数据类型之间具有高度相关性。例如,有诊断记录的患者比例从所有患者中的35.0%增加到至少使用过1种药物的患者中的90.9%。一个不相关的理赔数据集独立显示,大多数过滤器选择的成员年龄较大且更可能为女性,并且会排除很大一部分数据实际上完整的人群。

讨论与结论

在研究人员设计研究时,他们需要在对数据完整性的信心与对数据设置要求对最终患者队列的影响之间取得平衡。

相似文献

1
Biases introduced by filtering electronic health records for patients with "complete data".通过筛选具有“完整数据”的患者的电子健康记录所引入的偏差。
J Am Med Inform Assoc. 2017 Nov 1;24(6):1134-1141. doi: 10.1093/jamia/ocx071.
2
Population-level surveillance of congenital heart defects among adolescents and adults in Colorado: Implications of record linkage.科罗拉多州青少年和成年人先天性心脏病的人群水平监测:记录链接的意义。
Am Heart J. 2020 Aug;226:75-84. doi: 10.1016/j.ahj.2020.04.008. Epub 2020 Apr 19.
3
Comparing medical history data derived from electronic health records and survey answers in the All of Us Research Program.比较“我们所有人”研究计划中电子健康记录和调查答案得出的医学史数据。
J Am Med Inform Assoc. 2022 Jun 14;29(7):1131-1141. doi: 10.1093/jamia/ocac046.
4
A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。
Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.
5
Rules Based Data Quality Assessment on Claims Database.基于规则的理赔数据库数据质量评估
Stud Health Technol Inform. 2020 Jun 26;272:350-353. doi: 10.3233/SHTI200567.
6
Claims-based studies of oral glucose-lowering medications can achieve balance in critical clinical variables only observed in electronic health records.基于索赔的口服降血糖药物研究仅能在电子健康记录中观察到关键临床变量的平衡。
Diabetes Obes Metab. 2018 Apr;20(4):974-984. doi: 10.1111/dom.13184. Epub 2018 Jan 12.
7
Electronic health records vs Medicaid claims: completeness of diabetes preventive care data in community health centers.电子健康记录与医疗补助索赔:社区卫生中心中糖尿病预防保健数据的完整性。
Ann Fam Med. 2011 Jul-Aug;9(4):351-8. doi: 10.1370/afm.1279.
8
A broadly applicable approach to enrich electronic-health-record cohorts by identifying patients with complete data: a multisite evaluation.一种通过识别具有完整数据的患者来丰富电子健康记录队列的广泛适用方法:多站点评估。
J Am Med Inform Assoc. 2023 Nov 17;30(12):1985-1994. doi: 10.1093/jamia/ocad166.
9
Approach to addressing missing data for electronic medical records and pharmacy claims data research.电子病历和药房报销数据研究中缺失数据的处理方法。
Pharmacotherapy. 2015 Apr;35(4):380-7. doi: 10.1002/phar.1569.
10
Definition, structure, content, use and impacts of electronic health records: a review of the research literature.电子健康记录的定义、结构、内容、用途及影响:研究文献综述
Int J Med Inform. 2008 May;77(5):291-304. doi: 10.1016/j.ijmedinf.2007.09.001. Epub 2007 Oct 22.

引用本文的文献

1
Evaluation of the impact of defining observable time in real-world data on outcome incidence.评估在真实世界数据中定义可观察时间对结局发生率的影响。
J Am Med Inform Assoc. 2025 Sep 1;32(9):1434-1444. doi: 10.1093/jamia/ocaf119.
2
Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis.长期诊断预测的二元分类局限性及离散时间事件发生时间方法的优势:实证分析
JMIR AI. 2025 Mar 27;4:e62985. doi: 10.2196/62985.
3
Predicting Early Outcomes of Prostatic Artery Embolization Using -Butyl Cyanoacrylate Liquid Embolic Agent: A Machine Learning Study.使用氰基丙烯酸正丁酯液体栓塞剂预测前列腺动脉栓塞术的早期结果:一项机器学习研究
Diagnostics (Basel). 2025 May 28;15(11):1351. doi: 10.3390/diagnostics15111351.
4
Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for "Complete Data": Observational Clinical Data Analysis.通过筛选电子健康记录以获取“完整数据”引入的种族和民族偏见:观察性临床数据分析
JMIR Med Inform. 2025 Mar 27;13:e67591. doi: 10.2196/67591.
5
Collaborative Filtering for the Imputation of Patient Reported Outcomes.用于患者报告结局插补的协同过滤
Database Expert Syst Appl (2024). 2024 Aug;14910:231-248. doi: 10.1007/978-3-031-68309-1_20. Epub 2024 Aug 18.
6
Sex and Gender Variables in Data Set Creation and Data Cleaning for Inclusive and Accurate Reproductive Health Research and Quality Improvement.用于包容性和准确的生殖健康研究及质量改进的数据集创建和数据清理中的性别与性变量
J Midwifery Womens Health. 2025 Jan-Feb;70(1):131-136. doi: 10.1111/jmwh.13698. Epub 2024 Sep 30.
7
Avenues for Strengthening PCORnet's Capacity to Advance Patient-Centered Economic Outcomes in Patient-Centered Outcomes Research (PCOR).加强 PCORnet 推进以患者为中心的医疗成果研究(PCOR)中以患者为中心的经济结果能力的途径。
Med Care. 2023 Dec 1;61(12 Suppl 2):S153-S160. doi: 10.1097/MLR.0000000000001929. Epub 2023 Nov 9.
8
Second-Line Pharmaceutical Treatments for Patients with Type 2 Diabetes.二线药物治疗 2 型糖尿病患者。
JAMA Netw Open. 2023 Oct 2;6(10):e2336613. doi: 10.1001/jamanetworkopen.2023.36613.
9
The relationship between hyperglycaemia on admission and patient outcome is modified by hyperlactatemia and diabetic status: a retrospective analysis of the eICU collaborative research database.入院时高血糖与患者预后的关系受高乳酸血症和糖尿病状态的影响:对 eICU 协作研究数据库的回顾性分析。
Sci Rep. 2023 Sep 21;13(1):15692. doi: 10.1038/s41598-023-43044-7.
10
Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt.基于提示的自回归生成式多标签少样本ICD编码
Proc AAAI Conf Artif Intell. 2023 Jun 26;37(4):5366-5374. doi: 10.1609/aaai.v37i4.25668.

本文引用的文献

1
Health information exchange among U.S. hospitals: who's in, who's out, and why?美国医院间的健康信息交流:谁参与,谁不参与,以及原因是什么?
Healthc (Amst). 2014 Mar;2(1):26-32. doi: 10.1016/j.hjdsi.2013.12.005. Epub 2014 Feb 5.
2
Factors related to health information exchange participation and use.与健康信息交换参与和使用相关的因素。
J Med Syst. 2014 Aug;38(8):78. doi: 10.1007/s10916-014-0078-1. Epub 2014 Jun 24.
3
Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research.隐藏在明显之处:在为研究从电子健康记录数据充足的患者中抽样时,对患病患者的偏好。
BMC Med Inform Decis Mak. 2014 Jun 11;14:51. doi: 10.1186/1472-6947-14-51.
4
Finding the missing link for big biomedical data.寻找大型生物医学数据的缺失环节。
JAMA. 2014 Jun 25;311(24):2479-80. doi: 10.1001/jama.2014.4228.
5
PCORnet: turning a dream into reality.PCORnet:将梦想变为现实。
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):576-7. doi: 10.1136/amiajnl-2014-002864. Epub 2014 May 12.
6
Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): architecture.可扩展的学习型医疗保健系统协作基础架构 (SCILHS):架构。
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):615-20. doi: 10.1136/amiajnl-2014-002727. Epub 2014 May 12.
7
PCORI at 3 years--progress, lessons, and plans.PCORI 成立 3 年——进展、经验教训与计划。
N Engl J Med. 2014 Feb 13;370(7):592-5. doi: 10.1056/NEJMp1313061.
8
Agreement of Medicaid claims and electronic health records for assessing preventive care quality among adults.评估成年人预防保健质量的医疗补助索赔和电子健康记录的一致性。
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):720-4. doi: 10.1136/amiajnl-2013-002333. Epub 2014 Feb 7.
9
Emergency physicians' perspectives on their use of health information exchange.急诊医师对其使用健康信息交换的看法。
Ann Emerg Med. 2014 Mar;63(3):329-37. doi: 10.1016/j.annemergmed.2013.09.024. Epub 2013 Oct 22.
10
Accountable Care Organizations in the United States: market and demographic factors associated with formation.美国的问责制医疗组织:与组建相关的市场和人口统计学因素。
Health Serv Res. 2013 Dec;48(6 Pt 1):1840-58. doi: 10.1111/1475-6773.12102. Epub 2013 Oct 1.