Curtis Melissa D, Griffith Sandra D, Tucker Melisa, Taylor Michael D, Capra William B, Carrigan Gillis, Holzman Ben, Torres Aracelis Z, You Paul, Arnieri Brandon, Abernethy Amy P
Flatiron Health, New York, NY.
Genentech, South San Francisco, CA.
Health Serv Res. 2018 Dec;53(6):4460-4476. doi: 10.1111/1475-6773.12872. Epub 2018 May 14.
To create a high-quality electronic health record (EHR)-derived mortality dataset for retrospective and prospective real-world evidence generation.
DATA SOURCES/STUDY SETTING: Oncology EHR data, supplemented with external commercial and US Social Security Death Index data, benchmarked to the National Death Index (NDI).
We developed a recent, linkable, high-quality mortality variable amalgamated from multiple data sources to supplement EHR data, benchmarked against the highest completeness U.S. mortality data, the NDI. Data quality of the mortality variable version 2.0 is reported here.
For advanced non-small-cell lung cancer, sensitivity of mortality information improved from 66 percent in EHR structured data to 91 percent in the composite dataset, with high date agreement compared to the NDI. For advanced melanoma, metastatic colorectal cancer, and metastatic breast cancer, sensitivity of the final variable was 85 to 88 percent. Kaplan-Meier survival analyses showed that improving mortality data completeness minimized overestimation of survival relative to NDI-based estimates.
For EHR-derived data to yield reliable real-world evidence, it needs to be of known and sufficiently high quality. Considering the impact of mortality data completeness on survival endpoints, we highlight the importance of data quality assessment and advocate benchmarking to the NDI.
创建一个高质量的电子健康记录(EHR)衍生死亡率数据集,用于回顾性和前瞻性真实世界证据生成。
数据来源/研究背景:肿瘤学EHR数据,辅以外部商业数据和美国社会保障死亡指数数据,并以国家死亡指数(NDI)为基准。
我们从多个数据源开发了一个近期可关联的高质量死亡率变量,以补充EHR数据,并以美国完整性最高的死亡率数据NDI为基准。本文报告了死亡率变量2.0版本的数据质量。
对于晚期非小细胞肺癌,死亡率信息的敏感性从EHR结构化数据中的66%提高到复合数据集中的91%,与NDI相比日期一致性较高。对于晚期黑色素瘤、转移性结直肠癌和转移性乳腺癌,最终变量的敏感性为85%至88%。Kaplan-Meier生存分析表明,提高死亡率数据的完整性可将相对于基于NDI的估计的生存高估降至最低。
为使EHR衍生数据产生可靠的真实世界证据,其质量需已知且足够高。考虑到死亡率数据完整性对生存终点的影响,我们强调了数据质量评估的重要性,并提倡以NDI为基准。