Wing Kevin, Bhaskaran Krishnan, Smeeth Liam, van Staa Tjeerd P, Klungel Olaf H, Reynolds Robert F, Douglas Ian
Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK.
Department of Pharmacoepidemiology, Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Utrecht, The Netherlands Health eResearch Centre, University of Manchester, Manchester, UK.
BMJ Open. 2016 Sep 2;6(9):e012102. doi: 10.1136/bmjopen-2016-012102.
We aimed to create a 'multidatabase' algorithm for identification of cholestatic liver injury using multiple linked UK databases, before (1) assessing the improvement in case ascertainment compared to using a single database and (2) developing a new single-database case-definition algorithm, validated against the multidatabase algorithm.
Method development for case ascertainment.
Three UK population-based electronic health record databases: the UK Clinical Practice Research Datalink (CPRD), the UK Hospital Episodes Statistics (HES) database and the UK Office of National Statistics (ONS) mortality database.
16 040 people over the age of 18 years with linked CPRD-HES records indicating potential cholestatic liver injury between 1 January 2000 and 1 January 2013.
(1) The number of cases of cholestatic liver injury detected by the multidatabase algorithm. (2) The relative contribution of each data source to multidatabase case status. (3) The ability of the new single-database algorithm to discriminate multidatabase algorithm case status.
Within the multidatabase case identification algorithm, 4033 of 16 040 potential cases (25%) were identified as definite cases based on CPRD data. HES data allowed possible cases to be discriminated from unlikely cases (947 of 16 040, 6%), but only facilitated identification of 1 definite case. ONS data did not contribute to case definition. The new single-database (CPRD-only) algorithm had a very good ability to discriminate multidatabase case status (area under the receiver operator characteristic curve 0.95).
CPRD-HES-ONS linkage confers minimal improvement in cholestatic liver injury case ascertainment compared to using CPRD data alone, and a multidatabase algorithm provides little additional information for validation of a CPRD-only algorithm. The availability of laboratory test results within CPRD but not HES means that algorithms based on CPRD-HES-linked data may not always be merited for studies of liver injury, or for other outcomes relying primarily on laboratory test results.
我们旨在利用多个相互关联的英国数据库创建一种用于识别胆汁淤积性肝损伤的“多数据库”算法,在此之前,(1)评估与使用单个数据库相比病例确诊率的提高情况,以及(2)开发一种新的单数据库病例定义算法,并根据多数据库算法进行验证。
病例确诊的方法开发。
三个基于英国人群的电子健康记录数据库:英国临床实践研究数据链(CPRD)、英国医院事件统计(HES)数据库和英国国家统计局(ONS)死亡率数据库。
16040名年龄在18岁以上且CPRD - HES记录相互关联,显示在2000年1月1日至2013年1月1日期间可能患有胆汁淤积性肝损伤的人群。
(1)多数据库算法检测到的胆汁淤积性肝损伤病例数。(2)每个数据源对多数据库病例状态的相对贡献。(3)新的单数据库算法区分多数据库算法病例状态的能力。
在多数据库病例识别算法中,16040例潜在病例中有4033例(25%)根据CPRD数据被确定为确诊病例。HES数据可将可能病例与不太可能病例区分开(16040例中的947例,6%),但仅促成了1例确诊病例的识别。ONS数据对病例定义没有贡献。新的单数据库(仅CPRD)算法具有很好的区分多数据库病例状态的能力(受试者操作特征曲线下面积为0.95)。
与仅使用CPRD数据相比,CPRD - HES - ONS数据联动在胆汁淤积性肝损伤病例确诊方面的改善极小,并且多数据库算法为仅基于CPRD的算法验证提供的额外信息很少。CPRD中有实验室检测结果而HES中没有,这意味着基于CPRD - HES关联数据的算法对于肝损伤研究或其他主要依赖实验室检测结果的结局可能并不总是适用。