术语集：临床代码集的透明且可重现的表示形式。

Term sets: A transparent and reproducible representation of clinical code sets.

机构信息

Greater Manchester Patient Safety Translational Research Centre, University of Manchester, Manchester, United Kingdom.

Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, United Kingdom.

出版信息

PLoS One. 2019 Feb 14;14(2):e0212291. doi: 10.1371/journal.pone.0212291. eCollection 2019.

DOI:10.1371/journal.pone.0212291

PMID:30763407

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6375602/

Abstract

OBJECTIVE

Clinical code sets are vital to research using routinely-collected electronic healthcare data. Existing code set engineering methods pose significant limitations when considering reproducible research. To improve the transparency and reusability of research, these code sets must abide by FAIR principles; this is not currently happening. We propose 'term sets', an equivalent alternative to code sets that are findable, accessible, interoperable and reusable.

MATERIALS AND METHODS

We describe a new code set representation, consisting of natural language inclusion and exclusion terms (term sets), and explain its relationship to code sets. We formally prove that any code set has a corresponding term set. We demonstrate utility by searching for recently published code sets, representing them as term sets, and reporting on the number of inclusion and exclusion terms compared with the size of the code set.

RESULTS

Thirty-one code sets from 20 papers covering diverse disease domains were converted into term sets. The term sets were on average 74% the size of their equivalent original code set. Four term sets were larger due to deficiencies in the original code sets.

DISCUSSION

Term sets can concisely represent any code set. This may reduce barriers for examining and reusing code sets, which may accelerate research using healthcare databases. We have developed open-source software that supports researchers using term sets.

CONCLUSION

Term sets are independent of clinical code terminologies and therefore: enable reproducible research; are resistant to terminology changes; and are less error-prone as they are shorter than the equivalent code set.

摘要

目的

临床代码集对于使用常规收集的电子医疗保健数据进行研究至关重要。现有的代码集工程方法在考虑可重复研究时存在重大局限性。为了提高研究的透明度和可重复性，这些代码集必须遵守 FAIR 原则；但目前这并没有发生。我们提出了“术语集”，这是一种与代码集等效的替代方案，可实现查找、可访问、互操作和可重复使用。

材料与方法

我们描述了一种新的代码集表示形式，由自然语言包含和排除术语（术语集）组成，并解释了它与代码集的关系。我们正式证明了任何代码集都有相应的术语集。我们通过搜索最近发表的代码集、将它们表示为术语集，并报告与代码集大小相比的包含和排除术语数量，来展示其实用性。

结果

从涵盖不同疾病领域的 20 篇论文中转换了 31 个代码集为术语集。术语集的平均大小为其等效原始代码集的 74%。由于原始代码集的缺陷，有四个术语集更大。

讨论

术语集可以简洁地表示任何代码集。这可能会减少检查和重复使用代码集的障碍，从而加速使用医疗保健数据库的研究。我们已经开发了支持研究人员使用术语集的开源软件。

结论

术语集独立于临床代码术语，因此：能够实现可重复研究；能够抵抗术语变化；并且由于它们比等效的代码集更短，因此出错的可能性更小。

相似文献

Term sets: A transparent and reproducible representation of clinical code sets.术语集：临床代码集的透明且可重现的表示形式。

PLoS One. 2019 Feb 14;14(2):e0212291. doi: 10.1371/journal.pone.0212291. eCollection 2019.

Clinical code set engineering for reusing EHR data for research: A review.用于研究的电子健康记录（EHR）数据重用的临床代码集工程：综述

J Biomed Inform. 2017 Jun;70:1-13. doi: 10.1016/j.jbi.2017.04.010. Epub 2017 Apr 22.

A Data Transformation Methodology to Create Findable, Accessible, Interoperable, and Reusable Health Data: Software Design, Development, and Evaluation Study.一种创建可发现、可访问、可互操作和可重用健康数据的数据转换方法：软件设计、开发和评估研究。

J Med Internet Res. 2023 Mar 8;25:e42822. doi: 10.2196/42822.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Code sets for respiratory symptoms in electronic health records research: a systematic review protocol.电子健康记录研究中呼吸系统症状的代码集：系统评价方案。

BMJ Open. 2019 Mar 3;9(3):e025965. doi: 10.1136/bmjopen-2018-025965.

Initiatives, Concepts, and Implementation Practices of FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles in Health Data Stewardship Practice: Protocol for a Scoping Review.健康数据管理实践中FAIR（可查找、可访问、可互操作和可重用）数据原则的倡议、概念及实施实践：一项范围综述方案

JMIR Res Protoc. 2021 Feb 2;10(2):e22505. doi: 10.2196/22505.

Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review.健康数据治理中可发现性、可访问性、互操作性和可重用性数据原则的举措、概念和实施实践：范围综述。

J Med Internet Res. 2023 Aug 28;25:e45013. doi: 10.2196/45013.

Validation of ICD-9 Codes for Identification of Chronic Overlapping Pain Conditions.ICD-9 编码验证用于识别慢性重叠性疼痛病症。

J Pain Palliat Care Pharmacother. 2022 Sep;36(3):166-177. doi: 10.1080/15360288.2022.2089437. Epub 2022 Jul 28.

Moving Toward Findable, Accessible, Interoperable, Reusable Practices in Epidemiologic Research.迈向流行病学研究中可发现、可访问、可互操作和可重复使用的实践。

Am J Epidemiol. 2023 Jun 2;192(6):995-1005. doi: 10.1093/aje/kwad040.

Value sets and the problem of redundancy in value set repositories.值集与值集存储库中的冗余问题。

PLoS One. 2024 Dec 9;19(12):e0312289. doi: 10.1371/journal.pone.0312289. eCollection 2024.

引用本文的文献

rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.rcprd：一个用于简化临床实践研究数据链（CPRD）数据提取与处理并创建可供分析的数据集的R软件包。

PLoS One. 2025 Aug 19;20(8):e0327229. doi: 10.1371/journal.pone.0327229. eCollection 2025.

An automation framework for clinical codelist development validated with UK data from patients with multiple long-term conditions.一个用于临床代码列表开发的自动化框架，已通过来自患有多种长期疾病患者的英国数据进行验证。

BMC Med Res Methodol. 2025 May 24;25(1):138. doi: 10.1186/s12874-025-02541-1.

Value sets and the problem of redundancy in value set repositories.值集与值集存储库中的冗余问题。

PLoS One. 2024 Dec 9;19(12):e0312289. doi: 10.1371/journal.pone.0312289. eCollection 2024.

Checklist and guidance on creating codelists for routinely collected health data research.常规收集的健康数据研究编码列表创建清单及指南

NIHR Open Res. 2024 Sep 18;4:20. doi: 10.3310/nihropenres.13550.2. eCollection 2024.

Disagreement concerning atopic dermatitis subtypes between an English prospective cohort (ALSPAC) and linked electronic health records.英国前瞻性队列研究（ALSPAC）与相关电子健康记录在特应性皮炎亚型方面存在分歧。

Clin Exp Dermatol. 2024 Nov 22;49(12):1537-1546. doi: 10.1093/ced/llae196.

Pooling of primary care electronic health record (EHR) data on Huntington's disease (HD) and cancer: establishing comparability of two large UK databases.汇总初级保健电子健康记录 (EHR) 中亨廷顿病 (HD) 和癌症的数据：建立两个大型英国数据库的可比性。

BMJ Open. 2024 Feb 14;14(2):e070258. doi: 10.1136/bmjopen-2022-070258.

Clinical code usage in UK general practice: a cohort study exploring 18 conditions over 14 years.英国全科医疗中的临床编码使用情况：一项探索 18 种疾病 14 年的队列研究。

BMJ Open. 2022 Jul 25;12(7):e051456. doi: 10.1136/bmjopen-2021-051456.

Clinical validation of genomic functional screen data: Analysis of observed variants in an unselected population cohort.基因组功能筛选数据的临床验证：未选择人群队列中观察到的变异分析。

HGG Adv. 2022 Jan 8;3(2):100086. doi: 10.1016/j.xhgg.2022.100086. eCollection 2022 Apr 14.

Diagnosis of physical and mental health conditions in primary care during the COVID-19 pandemic: a retrospective cohort study.在 COVID-19 大流行期间初级保健中的身心健康状况诊断：一项回顾性队列研究。

Lancet Public Health. 2020 Oct;5(10):e543-e550. doi: 10.1016/S2468-2667(20)30201-2. Epub 2020 Sep 23.

本文引用的文献

ACL and meniscal injuries increase the risk of primary total knee replacement for osteoarthritis: a matched case-control study using the Clinical Practice Research Datalink (CPRD).ACL 和半月板损伤会增加原发性全膝关节置换术治疗骨关节炎的风险：一项基于临床实践研究数据库（CPRD）的匹配病例对照研究。

Br J Sports Med. 2019 Aug;53(15):965-968. doi: 10.1136/bjsports-2017-097762. Epub 2018 Jan 13.

Risk of fracture among patients with polymyalgia rheumatica and giant cell arteritis: a population-based study.风湿性多肌痛和巨细胞动脉炎患者的骨折风险：一项基于人群的研究。

BMC Med. 2018 Jan 10;16(1):4. doi: 10.1186/s12916-017-0987-1.

Aromatase inhibitors and the risk of colorectal cancer in postmenopausal women with breast cancer.芳香酶抑制剂与乳腺癌绝经后妇女结直肠癌风险的关系。

Ann Oncol. 2018 Mar 1;29(3):744-748. doi: 10.1093/annonc/mdx822.

Antibiotic Drug Use and the Risk of Stevens-Johnson Syndrome and Toxic Epidermal Necrolysis: A Population-Based Case-Control Study.抗生素使用与史蒂文斯-约翰逊综合征和中毒性表皮坏死松解症的风险：一项基于人群的病例对照研究。

J Invest Dermatol. 2018 May;138(5):1207-1209. doi: 10.1016/j.jid.2017.12.015. Epub 2017 Dec 19.

The risk of fragility fractures in new users of dipeptidyl peptidase-4 inhibitors compared to sulfonylureas and other anti-diabetic drugs: A cohort study.新型二肽基肽酶-4 抑制剂使用者与磺酰脲类药物和其他抗糖尿病药物使用者相比发生脆性骨折的风险：一项队列研究。

Diabetes Res Clin Pract. 2018 Feb;136:159-167. doi: 10.1016/j.diabres.2017.12.008. Epub 2017 Dec 16.

Trends in diagnosis and treatment for people with dementia in the UK from 2005 to 2015: a longitudinal retrospective cohort study.2005 年至 2015 年期间英国痴呆症患者的诊断和治疗趋势：一项纵向回顾性队列研究。

Lancet Public Health. 2017 Mar;2(3):e149-e156. doi: 10.1016/S2468-2667(17)30031-2. Epub 2017 Feb 24.

Validity of diagnostic codes to identify hospitalizations for infections among patients treated with oral anti-diabetic drugs.诊断代码在识别使用口服抗糖尿病药物治疗的患者感染住院方面的有效性。

Pharmacoepidemiol Drug Saf. 2018 Oct;27(10):1147-1150. doi: 10.1002/pds.4368. Epub 2017 Dec 18.

Prescription of DPP-4 Inhibitors to Type 2 Diabetes Mellitus Patients With Renal Impairment: A UK Primary Care Experience.二肽基肽酶-4 抑制剂处方用于伴有肾功能损害的 2 型糖尿病患者：英国初级保健的经验。

Clin Ther. 2018 Jan;40(1):152-154. doi: 10.1016/j.clinthera.2017.11.009. Epub 2017 Dec 13.

Incidence and prevalence of hepatitis B in patients with diabetes mellitus in the UK: A population-based cohort study using the UK Clinical Practice Research Datalink.英国糖尿病患者中乙型肝炎的发病率和患病率：一项基于人群的队列研究，使用英国临床实践研究数据链。

J Viral Hepat. 2018 May;25(5):571-580. doi: 10.1111/jvh.12841. Epub 2018 Jan 17.

Protective effect of antirheumatic drugs on dementia in rheumatoid arthritis patients.抗风湿药物对类风湿关节炎患者痴呆症的保护作用。

Alzheimers Dement (N Y). 2017 Nov 9;3(4):612-621. doi: 10.1016/j.trci.2017.10.002. eCollection 2017 Nov.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验