Watson Jessica, Nicholson Brian D, Hamilton Willie, Price Sarah
Centre for Academic Primary Care, Bristol Medical School, University of Bristol, Bristol, UK.
Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK.
BMJ Open. 2017 Nov 22;7(11):e019637. doi: 10.1136/bmjopen-2017-019637.
Analysis of routinely collected electronic health record (EHR) data from primary care is reliant on the creation of codelists to define clinical features of interest. To improve scientific rigour, transparency and replicability, we describe and demonstrate a standardised reproducible methodology for clinical codelist development.
We describe a three-stage process for developing clinical codelists. First, the clear definition a priori of the clinical feature of interest using reliable clinical resources. Second, development of a list of potential codes using statistical software to comprehensively search all available codes. Third, a modified Delphi process to reach consensus between primary care practitioners on the most relevant codes, including the generation of an 'uncertainty' variable to allow sensitivity analysis.
These methods are illustrated by developing a codelist for shortness of breath in a primary care EHR sample, including modifiable syntax for commonly used statistical software.
The codelist was used to estimate the frequency of shortness of breath in a cohort of 28 216 patients aged over 18 years who received an incident diagnosis of lung cancer between 1 January 2000 and 30 November 2016 in the Clinical Practice Research Datalink (CPRD).
Of 78 candidate codes, 29 were excluded as inappropriate. Complete agreement was reached for 44 (90%) of the remaining codes, with partial disagreement over 5 (10%). 13 091 episodes of shortness of breath were identified in the cohort of 28 216 patients. Sensitivity analysis demonstrates that codes with the greatest uncertainty tend to be rarely used in clinical practice.
Although initially time consuming, using a rigorous and reproducible method for codelist generation 'future-proofs' findings and an auditable, modifiable syntax for codelist generation enables sharing and replication of EHR studies. Published codelists should be badged by quality and report the methods of codelist generation including: definitions and justifications associated with each codelist; the syntax or search method; the number of candidate codes identified; and the categorisation of codes after Delphi review.
对基层医疗中常规收集的电子健康记录(EHR)数据进行分析,依赖于创建代码列表来定义感兴趣的临床特征。为提高科学严谨性、透明度和可重复性,我们描述并展示一种用于临床代码列表开发的标准化可重复方法。
我们描述了一个用于开发临床代码列表的三阶段过程。首先,使用可靠的临床资源对感兴趣的临床特征进行先验明确定义。其次,使用统计软件开发一份潜在代码列表,以全面搜索所有可用代码。第三,采用改良的德尔菲法,让基层医疗从业者就最相关的代码达成共识,包括生成一个“不确定性”变量以进行敏感性分析。
通过在基层医疗EHR样本中为呼吸急促开发一个代码列表来说明这些方法,包括常用统计软件的可修改语法。
该代码列表用于估计在临床实践研究数据链(CPRD)中,2000年1月1日至2016年11月30日期间首次被诊断患有肺癌的28216名18岁以上患者队列中呼吸急促的发生率。
在78个候选代码中,29个因不合适而被排除。其余代码中有44个(90%)达成了完全一致,5个(10%)存在部分分歧。在28216名患者队列中识别出13091次呼吸急促发作。敏感性分析表明,不确定性最大的代码在临床实践中往往很少使用。
虽然最初耗时,但使用严格且可重复的方法生成代码列表可使研究结果“与时俱进”,并且用于生成代码列表的可审计、可修改语法能够实现EHR研究的共享和复制。已发布的代码列表应以质量为标识,并报告代码列表生成方法,包括:与每个代码列表相关的定义和理由;语法或搜索方法;识别出的候选代码数量;以及德尔菲审查后的代码分类。