• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在临床实践研究数据链研究中,遗漏自由文本记录是否可能成为数据丢失和偏差的一个来源?一项病例对照研究。

Is omission of free text records a possible source of data loss and bias in Clinical Practice Research Datalink studies? A case-control study.

作者信息

Price Sarah J, Stapley Sal A, Shephard Elizabeth, Barraclough Kevin, Hamilton William T

机构信息

Medical School, University of Exeter, College House, Exeter, UK.

Hoyland House, Painswick, UK.

出版信息

BMJ Open. 2016 May 13;6(5):e011664. doi: 10.1136/bmjopen-2016-011664.

DOI:10.1136/bmjopen-2016-011664
PMID:27178981
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4874123/
Abstract

OBJECTIVES

To estimate data loss and bias in studies of Clinical Practice Research Datalink (CPRD) data that restrict analyses to Read codes, omitting anything recorded as text.

DESIGN

Matched case-control study.

SETTING

Patients contributing data to the CPRD.

PARTICIPANTS

4915 bladder and 3635 pancreatic, cancer cases diagnosed between 1 January 2000 and 31 December 2009, matched on age, sex and general practitioner practice to up to 5 controls (bladder: n=21 718; pancreas: n=16 459). The analysis period was the year before cancer diagnosis.

PRIMARY AND SECONDARY OUTCOME MEASURES

Frequency of haematuria, jaundice and abdominal pain, grouped by recording style: Read code or text-only (ie, hidden text). The association between recording style and case-control status (χ(2) test). For each feature, the odds ratio (OR; conditional logistic regression) and positive predictive value (PPV; Bayes' theorem) for cancer, before and after addition of hidden text records.

RESULTS

Of the 20 958 total records of the features, 7951 (38%) were recorded in hidden text. Hidden text recording was more strongly associated with controls than with cases for haematuria (140/336=42% vs 556/3147=18%) in bladder cancer (χ(2) test, p<0.001), and for jaundice (21/31=67% vs 463/1565=30%, p<0.0001) and abdominal pain (323/1126=29% vs 397/1789=22%, p<0.001) in pancreatic cancer. Adding hidden text records corrected PPVs of haematuria for bladder cancer from 4.0% (95% CI 3.5% to 4.6%) to 2.9% (2.6% to 3.2%), and of jaundice for pancreatic cancer from 12.8% (7.3% to 21.6%) to 6.3% (4.5% to 8.7%). Adding hidden text records did not alter the PPV of abdominal pain for bladder (codes: 0.14%, 0.13% to 0.16% vs codes plus hidden text: 0.14%, 0.13% to 0.15%) or pancreatic (0.23%, 0.21% to 0.25% vs 0.21%, 0.20% to 0.22%) cancer.

CONCLUSIONS

Omission of text records from CPRD studies introduces bias that inflates outcome measures for recognised alarm symptoms. This potentially reinforces clinicians' views of the known importance of these symptoms, marginalising the significance of 'low-risk but not no-risk' symptoms.

摘要

目的

评估临床实践研究数据链(CPRD)数据研究中的数据丢失和偏差,这些研究将分析限制在读取代码,而忽略任何以文本形式记录的内容。

设计

配对病例对照研究。

研究背景

向CPRD贡献数据的患者。

参与者

2000年1月1日至2009年12月31日期间诊断出的4915例膀胱癌和3635例胰腺癌病例,根据年龄、性别和全科医生执业情况与多达5名对照进行匹配(膀胱癌:n = 21718;胰腺癌:n = 16459)。分析期为癌症诊断前一年。

主要和次要结局指标

血尿、黄疸和腹痛的发生频率,按记录方式分组:读取代码或仅文本(即隐藏文本)。记录方式与病例对照状态之间的关联(χ²检验)。对于每个特征,添加隐藏文本记录前后癌症的比值比(OR;条件逻辑回归)和阳性预测值(PPV;贝叶斯定理)。

结果

在20958条特征的总记录中,7951条(38%)以隐藏文本形式记录。在膀胱癌中,隐藏文本记录与血尿的对照组关联更强,而非病例组(140/336 = 42% 对 556/3147 = 18%)(χ²检验,p < 0.001);在胰腺癌中,对于黄疸(21/31 = 67% 对 463/1565 = 30%,p < 0.0001)和腹痛(323/1126 = 29% 对 397/1789 = 22%,p < 0.001)也是如此。添加隐藏文本记录后,膀胱癌血尿的PPV从4.0%(95%CI 3.5%至4.6%)校正至2.9%(2.6%至3.2%),胰腺癌黄疸的PPV从12.8%(7.3%至21.6%)校正至6.3%(4.5%至8.7%)。添加隐藏文本记录未改变膀胱癌(代码:0.14%,0.13%至0.16% 对 代码加隐藏文本:0.14%,0.13%至0.15%)或胰腺癌腹痛的PPV(0.23%,0.21%至0.25% 对 0.21%,0.20%至0.22%)。

结论

CPRD研究中遗漏文本记录会引入偏差,从而夸大已识别警报症状的结局指标。这可能强化临床医生对这些症状已知重要性的看法,使“低风险但非无风险”症状的重要性被边缘化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1aa7/4874123/f7a8eaabe8ce/bmjopen2016011664f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1aa7/4874123/b111335bdf70/bmjopen2016011664f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1aa7/4874123/679d92060d76/bmjopen2016011664f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1aa7/4874123/f7a8eaabe8ce/bmjopen2016011664f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1aa7/4874123/b111335bdf70/bmjopen2016011664f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1aa7/4874123/679d92060d76/bmjopen2016011664f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1aa7/4874123/f7a8eaabe8ce/bmjopen2016011664f03.jpg

相似文献

1
Is omission of free text records a possible source of data loss and bias in Clinical Practice Research Datalink studies? A case-control study.在临床实践研究数据链研究中,遗漏自由文本记录是否可能成为数据丢失和偏差的一个来源?一项病例对照研究。
BMJ Open. 2016 May 13;6(5):e011664. doi: 10.1136/bmjopen-2016-011664.
2
Non-visible versus visible haematuria and bladder cancer risk: a study of electronic records in primary care.隐匿性血尿与肉眼血尿和膀胱癌风险:一项初级医疗电子记录研究
Br J Gen Pract. 2014 Sep;64(626):e584-9. doi: 10.3399/bjgp14X681409.
3
Validation of asthma recording in the Clinical Practice Research Datalink (CPRD).临床实践研究数据链(CPRD)中哮喘记录的验证
BMJ Open. 2017 Aug 11;7(8):e017474. doi: 10.1136/bmjopen-2017-017474.
4
Validation study of bullous pemphigoid and pemphigus vulgaris recording in routinely collected electronic primary healthcare records in England.英国常规电子初级医疗保健记录中记录的大疱性类天疱疮和寻常型天疱疮的验证研究。
BMJ Open. 2020 Jul 14;10(7):e035934. doi: 10.1136/bmjopen-2019-035934.
5
Clinical features of bladder cancer in primary care.初级保健中的膀胱癌临床特征。
Br J Gen Pract. 2012 Sep;62(602):e598-604. doi: 10.3399/bjgp12X654560.
6
What evidence is there for a delay in diagnostic coding of RA in UK general practice records? An observational study of free text.RA 在英国全科医疗记录中的诊断编码是否存在延迟?一项观察性的自由文本研究。
BMJ Open. 2016 Jun 28;6(6):e010393. doi: 10.1136/bmjopen-2015-010393.
7
Cancer recording in patients with and without type 2 diabetes in the Clinical Practice Research Datalink primary care data and linked hospital admission data: a cohort study.在临床实践研究数据库初级保健数据和相关住院数据中,记录有 2 型糖尿病和无 2 型糖尿病患者的癌症情况:一项队列研究。
BMJ Open. 2018 May 26;8(5):e020827. doi: 10.1136/bmjopen-2017-020827.
8
The risk of pancreatic cancer in symptomatic patients in primary care: a large case-control study using electronic records.初级保健中症状性患者患胰腺癌的风险:一项使用电子病历的大型病例对照研究。
Br J Cancer. 2012 Jun 5;106(12):1940-4. doi: 10.1038/bjc.2012.190. Epub 2012 May 22.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Pancreatic cancer symptom trajectories from Danish registry data and free text in electronic health records.从丹麦登记数据和电子健康记录中的自由文本中提取的胰腺癌症状轨迹。
Elife. 2023 Nov 21;12:e84919. doi: 10.7554/eLife.84919.

引用本文的文献

1
Weakly supervised text classification on free-text comments in patient-reported outcome measures.患者报告结局指标中自由文本评论的弱监督文本分类
Front Digit Health. 2025 Apr 30;7:1345360. doi: 10.3389/fdgth.2025.1345360. eCollection 2025.
2
Artificial intelligence for early detection of lung cancer in GPs' clinical notes: a retrospective observational cohort study.利用人工智能通过全科医生临床记录早期检测肺癌:一项回顾性观察队列研究
Br J Gen Pract. 2025 May 2;75(754):e316-e322. doi: 10.3399/BJGP.2023.0489. Print 2025 May.
3
Cancer incidence and competing mortality risk following 15 presenting symptoms in primary care: a population-based cohort study using electronic healthcare records.

本文引用的文献

1
Presentation of respiratory symptoms prior to diagnosis in general practice: a case-control study examining free text and morbidity codes.全科医疗中诊断前呼吸道症状的表现:一项检查自由文本和发病率编码的病例对照研究
BMJ Open. 2015 Jun 12;5(6):e007355. doi: 10.1136/bmjopen-2014-007355.
2
Data Resource Profile: Clinical Practice Research Datalink (CPRD).数据资源简介:临床实践研究数据链(CPRD)
Int J Epidemiol. 2015 Jun;44(3):827-36. doi: 10.1093/ije/dyv098. Epub 2015 Jun 6.
3
Quantifying the risk of Hodgkin lymphoma in symptomatic primary care patients aged ≥40 years: a case-control study using electronic records.
基层医疗中15种首发症状后的癌症发病率及竞争死亡风险:一项基于人群的队列研究,采用电子健康记录
BMJ Oncol. 2024 Nov 21;3(1):e000500. doi: 10.1136/bmjonc-2024-000500. eCollection 2024.
4
Underlying disease risk among patients with fatigue: a population-based cohort study in primary care.疲劳患者的潜在疾病风险:一项基于初级保健的人群队列研究。
Br J Gen Pract. 2024 Dec 10;75(750):e57-67. doi: 10.3399/BJGP.2025.0093.
5
Automated Medical Records Review for Mild Cognitive Impairment and Dementia.轻度认知障碍和痴呆的自动化医疗记录审查
Res Sq. 2024 Nov 6:rs.3.rs-5046441. doi: 10.21203/rs.3.rs-5046441/v1.
6
Privacy-preserving large language models for structured medical information retrieval.用于结构化医学信息检索的隐私保护大语言模型
NPJ Digit Med. 2024 Sep 20;7(1):257. doi: 10.1038/s41746-024-01233-2.
7
LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models.LLM-AIx:一种基于隐私保护大语言模型从非结构化医学文本中提取信息的开源管道。
medRxiv. 2024 Sep 3:2024.09.02.24312917. doi: 10.1101/2024.09.02.24312917.
8
Underlying disease risk among patients with fatigue: a population-based cohort study in primary care.疲劳患者的潜在疾病风险:一项基于人群的初级保健队列研究
Br J Gen Pract. 2024 Dec 13. doi: 10.3399/BJGP.2024.0093.
9
Predictive value of abnormal blood tests for detecting cancer in primary care patients with nonspecific abdominal symptoms: A population-based cohort study of 477,870 patients in England.异常血液检查对初级保健中具有非特异性腹部症状的患者癌症检测的预测价值:一项基于人群的队列研究,纳入了英格兰 477870 例患者。
PLoS Med. 2024 Jul 30;21(7):e1004426. doi: 10.1371/journal.pmed.1004426. eCollection 2024 Jul.
10
Development and Validation of Case-Finding Algorithms for Digestive Cancer in the Spanish Healthcare Database BIFAP.西班牙医疗数据库BIFAP中消化系统癌症病例发现算法的开发与验证
J Clin Med. 2024 Jan 9;13(2):361. doi: 10.3390/jcm13020361.
量化40岁及以上有症状的初级保健患者患霍奇金淋巴瘤的风险:一项使用电子记录的病例对照研究。
Br J Gen Pract. 2015 May;65(634):e289-94. doi: 10.3399/bjgp15X684805.
4
Quantifying the risk of non-Hodgkin lymphoma in symptomatic primary care patients aged ≥40 years: a large case-control study using electronic records.对40岁及以上有症状的初级保健患者非霍奇金淋巴瘤风险进行量化:一项使用电子记录的大型病例对照研究。
Br J Gen Pract. 2015 May;65(634):e281-8. doi: 10.3399/bjgp15X684793.
5
Quantifying the risk of multiple myeloma from symptoms reported in primary care patients: a large case-control study using electronic records.通过初级保健患者报告的症状量化多发性骨髓瘤的风险:一项使用电子记录的大型病例对照研究。
Br J Gen Pract. 2015 Feb;65(631):e106-13. doi: 10.3399/bjgp15X683545.
6
Risk of breast cancer in symptomatic women in primary care: a case-control study using electronic records.基层医疗中有症状女性患乳腺癌的风险:一项利用电子记录的病例对照研究。
Br J Gen Pract. 2014 Dec;64(629):e788-93. doi: 10.3399/bjgp14X682873.
7
Non-visible versus visible haematuria and bladder cancer risk: a study of electronic records in primary care.隐匿性血尿与肉眼血尿和膀胱癌风险:一项初级医疗电子记录研究
Br J Gen Pract. 2014 Sep;64(626):e584-9. doi: 10.3399/bjgp14X681409.
8
Recent advances in the utility and use of the General Practice Research Database as an example of a UK Primary Care Data resource.以普通实践研究数据库为例,简述英国初级保健数据资源的效用和使用方面的最新进展。
Ther Adv Drug Saf. 2012 Apr;3(2):89-99. doi: 10.1177/2042098611435911.
9
Risk of uterine cancer in symptomatic women in primary care: case-control study using electronic records.初级保健中症状性妇女的子宫癌风险:使用电子病历的病例对照研究。
Br J Gen Pract. 2013 Sep;63(614):e643-8. doi: 10.3399/bjgp13X671632.
10
Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text?优化电子健康记录的使用以估计初级保健中类风湿关节炎的发病率:免费文本中隐藏了哪些信息?
BMC Med Res Methodol. 2013 Aug 21;13:105. doi: 10.1186/1471-2288-13-105.