• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

哥伦比亚开放健康数据,来自电子健康记录的临床概念流行率和共同出现。

Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records.

机构信息

Department of Biomedical Informatics, Columbia University, NY, USA.

Institute of Data Science, Maastricht University, Maastricht, The Netherlands.

出版信息

Sci Data. 2018 Nov 27;5:180273. doi: 10.1038/sdata.2018.273.

DOI:10.1038/sdata.2018.273
PMID:30480666
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6257042/
Abstract

Columbia Open Health Data (COHD) is a publicly accessible database of electronic health record (EHR) prevalence and co-occurrence frequencies between conditions, drugs, procedures, and demographics. COHD was derived from Columbia University Irving Medical Center's Observational Health Data Sciences and Informatics (OHDSI) database. The lifetime dataset, derived from all records, contains 36,578 single concepts (11,952 conditions, 12,334 drugs, and 10,816 procedures) and 32,788,901 concept pairs from 5,364,781 patients. The 5-year dataset, derived from records from 2013-2017, contains 29,964 single concepts (10,159 conditions, 10,264 drugs, and 8,270 procedures) and 15,927,195 concept pairs from 1,790,431 patients. Exclusion of rare concepts (count ≤ 10) and Poisson randomization enable data sharing by eliminating risks to patient privacy. EHR prevalences are informative of healthcare consumption rates. Analysis of co-occurrence frequencies via relative frequency analysis and observed-expected frequency ratio are informative of associations between clinical concepts, useful for biomedical research tasks such as drug repurposing and pharmacovigilance. COHD is publicly accessible through a web application-programming interface (API) and downloadable from the Figshare repository. The code is available on GitHub.

摘要

哥伦比亚开放健康数据 (COHD) 是一个公共可访问的电子健康记录 (EHR) 患病率和疾病、药物、程序和人口统计学之间共同出现频率的数据库。COHD 源自哥伦比亚大学欧文医学中心的观察性健康数据科学和信息学 (OHDSI) 数据库。来自所有记录的终生数据集包含 36,578 个单一概念(11,952 种疾病、12,334 种药物和 10,816 种程序)和 32,788,901 个来自 5,364,781 名患者的概念对。来自 2013-2017 年记录的 5 年数据集包含 29,964 个单一概念(10,159 种疾病、10,264 种药物和 8,270 种程序)和 15,927,195 个来自 1,790,431 名患者的概念对。排除罕见概念(计数 ≤ 10)和泊松随机化通过消除对患者隐私的风险来实现数据共享。EHR 患病率可用于了解医疗保健消费率。通过相对频率分析和观察到的预期频率比分析共同出现频率可用于分析临床概念之间的关联,这对于药物重新利用和药物警戒等生物医学研究任务很有用。COHD 通过 Web 应用程序编程接口 (API) 公开访问,并可从 Figshare 存储库下载。该代码可在 GitHub 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc95/6257042/3791fb3ff918/sdata2018273-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc95/6257042/c8ea6a9a8b86/sdata2018273-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc95/6257042/e4b76b367163/sdata2018273-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc95/6257042/e40935915ed9/sdata2018273-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc95/6257042/3791fb3ff918/sdata2018273-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc95/6257042/c8ea6a9a8b86/sdata2018273-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc95/6257042/e4b76b367163/sdata2018273-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc95/6257042/e40935915ed9/sdata2018273-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc95/6257042/3791fb3ff918/sdata2018273-f4.jpg

相似文献

1
Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records.哥伦比亚开放健康数据,来自电子健康记录的临床概念流行率和共同出现。
Sci Data. 2018 Nov 27;5:180273. doi: 10.1038/sdata.2018.273.
2
Columbia Open Health Data for COVID-19 Research: Database Analysis.哥伦比亚新冠病毒大流行开放健康数据用于研究:数据库分析。
J Med Internet Res. 2021 Sep 30;23(9):e31122. doi: 10.2196/31122.
3
Detecting Systemic Data Quality Issues in Electronic Health Records.检测电子健康记录中的系统性数据质量问题。
Stud Health Technol Inform. 2019 Aug 21;264:383-387. doi: 10.3233/SHTI190248.
4
Transforming Anesthesia Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study.将麻醉数据转化为观察性医疗结局伙伴关系通用数据模型:开发和可用性研究。
J Med Internet Res. 2021 Oct 29;23(10):e29259. doi: 10.2196/29259.
5
Pharmacovigilance and Clinical Environment: Utilizing OMOP-CDM and OHDSI Software Stack to Integrate EHR Data.药物警戒与临床环境:利用观测医疗结果合作组织通用数据模型(OMOP-CDM)和观测医疗数据科学与信息学(OHDSI)软件栈整合电子健康记录(EHR)数据
Stud Health Technol Inform. 2021 May 27;281:555-559. doi: 10.3233/SHTI210232.
6
A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。
Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.
7
Preliminary exploration of survival analysis using the OHDSI common data model: a case study of intrahepatic cholangiocarcinoma.应用 OHDSI 通用数据模型进行生存分析的初步探索:以肝内胆管癌为例。
BMC Med Inform Decis Mak. 2018 Dec 7;18(Suppl 5):116. doi: 10.1186/s12911-018-0686-7.
8
The MIMIC Code Repository: enabling reproducibility in critical care research.MIMIC 代码库:实现重症监护研究的可重复性。
J Am Med Inform Assoc. 2018 Jan 1;25(1):32-39. doi: 10.1093/jamia/ocx084.
9
The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.澳大利亚初级保健数据中的 OMOP 通用数据模型:构建一个高质量、可用于研究的协调数据集。
PLoS One. 2024 Apr 18;19(4):e0301557. doi: 10.1371/journal.pone.0301557. eCollection 2024.
10
Patient characteristics and antiseizure medication pathways in newly diagnosed epilepsy: Feasibility and pilot results using the common data model in a single-center electronic medical record database.新诊断癫痫患者的特征和抗癫痫药物治疗途径:在单中心电子病历数据库中使用通用数据模型的可行性和初步结果。
Epilepsy Behav. 2022 Apr;129:108630. doi: 10.1016/j.yebeh.2022.108630. Epub 2022 Mar 8.

引用本文的文献

1
Implications of Data Extraction and Processing of Electronic Health Records for Epidemiological Research: Observational Study.电子健康记录的数据提取与处理对流行病学研究的影响:观察性研究
J Med Internet Res. 2025 Jun 11;27:e64628. doi: 10.2196/64628.
2
Identifying Phenotypes for Earlier Diagnosis of Rare Diseases.确定用于罕见病早期诊断的表型。
Stud Health Technol Inform. 2025 May 15;327:123-127. doi: 10.3233/SHTI250286.
3
Causal relationship between Parkinson's disease and gastric cancer: a Mendelian randomization study.帕金森病与胃癌之间的因果关系:一项孟德尔随机化研究。

本文引用的文献

1
Effect of vocabulary mapping for conditions on phenotype cohorts.条件词汇映射对表型队列的影响。
J Am Med Inform Assoc. 2018 Dec 1;25(12):1618-1625. doi: 10.1093/jamia/ocy124.
2
Co-occurrence of medical conditions: Exposing patterns through probabilistic topic modeling of snomed codes.医疗条件共现:通过 SNOMED 编码的概率主题建模揭示模式。
J Biomed Inform. 2018 Jun;82:31-40. doi: 10.1016/j.jbi.2018.04.008. Epub 2018 Apr 12.
3
Cancer statistics, 2018.癌症统计数据,2018 年。
BMC Neurol. 2025 Apr 16;25(1):163. doi: 10.1186/s12883-025-04184-7.
4
On the Readiness of Scientific Data Papers for a Fair and Transparent Use in Machine Learning.论科学数据论文在机器学习中公平透明使用的准备情况
Sci Data. 2025 Jan 13;12(1):61. doi: 10.1038/s41597-025-04402-4.
5
Generating Biomedical Knowledge Graphs from Knowledge Bases, Registries, and Multiomic Data.从知识库、注册库和多组学数据生成生物医学知识图谱。
bioRxiv. 2024 Nov 15:2024.11.14.623648. doi: 10.1101/2024.11.14.623648.
6
Acute kidney injury comorbidity analysis based on international classification of diseases-10 codes.基于国际疾病分类第 10 版编码的急性肾损伤合并症分析。
BMC Med Inform Decis Mak. 2024 Feb 3;24(1):35. doi: 10.1186/s12911-024-02435-0.
7
ARAX: a graph-based modular reasoning tool for translational biomedicine.ARAX:一种基于图的模块化推理工具,用于转化生物医学。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad082.
8
Pneumonia and Related Conditions in Critically Ill Patients-Insights from Basic and Experimental Studies.危重症患者的肺炎及相关情况——基础与实验研究的新视角。
Int J Mol Sci. 2022 Aug 31;23(17):9896. doi: 10.3390/ijms23179896.
9
OARD: Open annotations for rare diseases and their phenotypes based on real-world data.基于真实世界数据的罕见病及其表型的开放注释
Am J Hum Genet. 2022 Sep 1;109(9):1591-1604. doi: 10.1016/j.ajhg.2022.08.002. Epub 2022 Aug 22.
10
An approach for open multivariate analysis of integrated clinical and environmental exposures data.一种对综合临床和环境暴露数据进行开放式多变量分析的方法。
Inform Med Unlocked. 2021;26. doi: 10.1016/j.imu.2021.100733. Epub 2021 Sep 20.
CA Cancer J Clin. 2018 Jan;68(1):7-30. doi: 10.3322/caac.21442. Epub 2018 Jan 4.
4
Early detection of heart failure with varying prediction windows by structured and unstructured data in electronic health records.利用电子健康记录中的结构化和非结构化数据,通过不同预测窗口对心力衰竭进行早期检测。
Annu Int Conf IEEE Eng Med Biol Soc. 2015;2015:2530-3. doi: 10.1109/EMBC.2015.7318907.
5
Data integration of structured and unstructured sources for assigning clinical codes to patient stays.用于为患者住院分配临床代码的结构化和非结构化数据源的数据整合。
J Am Med Inform Assoc. 2016 Apr;23(e1):e11-9. doi: 10.1093/jamia/ocv115. Epub 2015 Aug 27.
6
Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers.观察性健康数据科学与信息学(OHDSI):观察性研究人员的机遇。
Stud Health Technol Inform. 2015;216:574-8.
7
Building the graph of medicine from millions of clinical narratives.从数百万份临床叙述中构建医学图谱。
Sci Data. 2014 Sep 16;1:140032. doi: 10.1038/sdata.2014.32. eCollection 2014.
8
Sharing clinical trial data: maximizing benefits, minimizing risk.分享临床试验数据:利益最大化,风险最小化。
JAMA. 2015 Feb 24;313(8):793-4. doi: 10.1001/jama.2015.292.
9
Hypertension among adults in the United States: National Health and Nutrition Examination Survey, 2011-2012.美国成年人高血压情况:2011 - 2012年国家健康与营养检查调查
NCHS Data Brief. 2013 Oct(133):1-8.
10
The prevalence of chronic diseases and multimorbidity in primary care practice: a PPRNet report.基层医疗中慢性病和多种疾病共存的流行情况:PPRNet 报告。
J Am Board Fam Med. 2013 Sep-Oct;26(5):518-24. doi: 10.3122/jabfm.2013.05.130012.