Computational Health Informatics, Boston Children's Hospital, Boston, Massachusetts, USA
Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.
BMJ Open. 2023 Feb 28;13(2):e065751. doi: 10.1136/bmjopen-2022-065751.
As highlighted by the COVID-19 pandemic, researchers are eager to make use of a wide variety of data sources, both government-sponsored and alternative, to characterise the epidemiology of infectious diseases. The objective of this study is to investigate the strengths and limitations of sources currently being used for research.
Retrospective descriptive analysis.
Yearly number of national-level and state-level disease-specific case counts and disease clusters for three diseases (measles, mumps and varicella) during a 5-year study period (2013-2017) across four different data sources: Optum (health insurance billing claims data), HealthMap (online news surveillance data), Morbidity and Mortality Weekly Reports (official government reports) and National Notifiable Disease Surveillance System (government case surveillance data).
Our study demonstrated drastic differences in reported infectious disease incidence across data sources. When compared with the other three sources of interest, Optum data showed substantially higher, implausible standardised case counts for all three diseases. Although there was some concordance in identified state-level case counts and disease clusters, all four sources identified variations in state-level reporting.
Researchers should consider data source limitations when attempting to characterise the epidemiology of infectious diseases. Some data sources, such as billing claims data, may be unsuitable for epidemiological research within the infectious disease context.
正如 COVID-19 大流行所强调的那样,研究人员渴望利用各种政府资助和非政府来源的数据来描述传染病的流行病学。本研究的目的是调查当前用于研究的来源的优势和局限性。
回顾性描述性分析。
在 5 年研究期间(2013-2017 年),针对三种疾病(麻疹、腮腺炎和风疹),在四个不同的数据来源中(Optum(健康保险计费索赔数据)、HealthMap(在线新闻监测数据)、发病率和死亡率周报(官方政府报告)和国家传染病监测系统(政府病例监测数据),每年进行国家级和州级疾病特异性病例计数和疾病集群的数量。
我们的研究表明,不同数据来源报告的传染病发病率存在显著差异。与其他三个来源相比,Optum 数据显示,所有三种疾病的标准化病例计数都明显更高,且不太可信。尽管在确定的州级病例计数和疾病集群方面存在一些一致性,但所有四个来源都确定了州级报告的差异。
研究人员在试图描述传染病的流行病学时,应考虑数据来源的局限性。某些数据源,如计费索赔数据,可能不适合传染病背景下的流行病学研究。