Price Sarah J, Stapley Sal A, Shephard Elizabeth, Barraclough Kevin, Hamilton William T
Medical School, University of Exeter, College House, Exeter, UK.
Hoyland House, Painswick, UK.
BMJ Open. 2016 May 13;6(5):e011664. doi: 10.1136/bmjopen-2016-011664.
To estimate data loss and bias in studies of Clinical Practice Research Datalink (CPRD) data that restrict analyses to Read codes, omitting anything recorded as text.
Matched case-control study.
Patients contributing data to the CPRD.
4915 bladder and 3635 pancreatic, cancer cases diagnosed between 1 January 2000 and 31 December 2009, matched on age, sex and general practitioner practice to up to 5 controls (bladder: n=21 718; pancreas: n=16 459). The analysis period was the year before cancer diagnosis.
Frequency of haematuria, jaundice and abdominal pain, grouped by recording style: Read code or text-only (ie, hidden text). The association between recording style and case-control status (χ(2) test). For each feature, the odds ratio (OR; conditional logistic regression) and positive predictive value (PPV; Bayes' theorem) for cancer, before and after addition of hidden text records.
Of the 20 958 total records of the features, 7951 (38%) were recorded in hidden text. Hidden text recording was more strongly associated with controls than with cases for haematuria (140/336=42% vs 556/3147=18%) in bladder cancer (χ(2) test, p<0.001), and for jaundice (21/31=67% vs 463/1565=30%, p<0.0001) and abdominal pain (323/1126=29% vs 397/1789=22%, p<0.001) in pancreatic cancer. Adding hidden text records corrected PPVs of haematuria for bladder cancer from 4.0% (95% CI 3.5% to 4.6%) to 2.9% (2.6% to 3.2%), and of jaundice for pancreatic cancer from 12.8% (7.3% to 21.6%) to 6.3% (4.5% to 8.7%). Adding hidden text records did not alter the PPV of abdominal pain for bladder (codes: 0.14%, 0.13% to 0.16% vs codes plus hidden text: 0.14%, 0.13% to 0.15%) or pancreatic (0.23%, 0.21% to 0.25% vs 0.21%, 0.20% to 0.22%) cancer.
Omission of text records from CPRD studies introduces bias that inflates outcome measures for recognised alarm symptoms. This potentially reinforces clinicians' views of the known importance of these symptoms, marginalising the significance of 'low-risk but not no-risk' symptoms.
评估临床实践研究数据链(CPRD)数据研究中的数据丢失和偏差,这些研究将分析限制在读取代码,而忽略任何以文本形式记录的内容。
配对病例对照研究。
向CPRD贡献数据的患者。
2000年1月1日至2009年12月31日期间诊断出的4915例膀胱癌和3635例胰腺癌病例,根据年龄、性别和全科医生执业情况与多达5名对照进行匹配(膀胱癌:n = 21718;胰腺癌:n = 16459)。分析期为癌症诊断前一年。
血尿、黄疸和腹痛的发生频率,按记录方式分组:读取代码或仅文本(即隐藏文本)。记录方式与病例对照状态之间的关联(χ²检验)。对于每个特征,添加隐藏文本记录前后癌症的比值比(OR;条件逻辑回归)和阳性预测值(PPV;贝叶斯定理)。
在20958条特征的总记录中,7951条(38%)以隐藏文本形式记录。在膀胱癌中,隐藏文本记录与血尿的对照组关联更强,而非病例组(140/336 = 42% 对 556/3147 = 18%)(χ²检验,p < 0.001);在胰腺癌中,对于黄疸(21/31 = 67% 对 463/1565 = 30%,p < 0.0001)和腹痛(323/1126 = 29% 对 397/1789 = 22%,p < 0.001)也是如此。添加隐藏文本记录后,膀胱癌血尿的PPV从4.0%(95%CI 3.5%至4.6%)校正至2.9%(2.6%至3.2%),胰腺癌黄疸的PPV从12.8%(7.3%至21.6%)校正至6.3%(4.5%至8.7%)。添加隐藏文本记录未改变膀胱癌(代码:0.14%,0.13%至0.16% 对 代码加隐藏文本:0.14%,0.13%至0.15%)或胰腺癌腹痛的PPV(0.23%,0.21%至0.25% 对 0.21%,0.20%至0.22%)。
CPRD研究中遗漏文本记录会引入偏差,从而夸大已识别警报症状的结局指标。这可能强化临床医生对这些症状已知重要性的看法,使“低风险但非无风险”症状的重要性被边缘化。