利用纵向收集的多中心电子健康记录对疾病史异质的患者人群进行分层的框架。

A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history.

机构信息

Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands.

Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands.

出版信息

J Am Med Inform Assoc. 2022 Apr 13;29(5):761-769. doi: 10.1093/jamia/ocac008.

DOI:10.1093/jamia/ocac008

PMID:35139533

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9122640/

Abstract

OBJECTIVE

To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects.

MATERIAL AND METHODS

We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features.

RESULTS

We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles.

DISCUSSION

Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data.

CONCLUSION

We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes.

摘要

目的

通过构建一个可推广、提供易于解释的结果且可克服电子健康记录（EHR）批次效应的管道，方便患者疾病亚组和危险因素的识别。

材料和方法

我们使用了来自 12 个医疗系统的 102880 名患者的 1872 个计费代码。使用单细胞组学借来的工具，我们减轻了中心特定的批次效应，并进行聚类以识别在各个中心具有高度相似病史模式的患者。我们的可视化方法（PheSpec）描绘了聚类的表型特征，应用了一种新的非信息性代码过滤方法（排名范围普遍），并指出了最具区别性的特征。

结果

我们观察到了 114 个具有临床意义的特征，例如将前列腺增生与癌症以及糖尿病与心血管问题联系起来，并将儿科发育障碍分组。我们的框架确定了疾病亚组，例如 6 个“其他头痛”聚类，其表型特征表明了不同的潜在机制：偏头痛、抽搐、损伤、眼部问题、关节痛和垂体腺疾病。表型模式复制得很好，相关性高达 0.75 以上，平均与 12 个不同队列中的 6 个（2-8 个）相关，这表明了我们的方法发现疾病史特征的一致性。

讨论

昂贵的临床研究应该基于可靠的假设。我们从单细胞组学中重新利用方法，从观察性 EHR 数据中构建这些假设，从复杂的数据中提取有用的信息。

结论

我们建立了一个可推广的管道，用于从广泛可用的高维计费代码中识别和复制具有临床意义的（亚）表型。这种方法克服了数据类型问题，并产生了可验证的表型的全面可视化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad8e/9122640/2ffcd691a54e/ocac008f1.jpg

相似文献

A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history.利用纵向收集的多中心电子健康记录对疾病史异质的患者人群进行分层的框架。

J Am Med Inform Assoc. 2022 Apr 13;29(5):761-769. doi: 10.1093/jamia/ocac008.

Validation of administrative health data for the identification of endometriosis diagnosis.用于识别子宫内膜异位症诊断的行政健康数据验证

Hum Reprod. 2025 Feb 1;40(2):289-295. doi: 10.1093/humrep/deae281.

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理：一项网络荟萃分析。

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。

Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Quality improvement strategies for diabetes care: Effects on outcomes for adults living with diabetes.糖尿病护理质量改进策略：对成年糖尿病患者结局的影响。

Cochrane Database Syst Rev. 2023 May 31;5(5):CD014513. doi: 10.1002/14651858.CD014513.

Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.首次就诊时磁共振灌注成像用于鉴别低级别与高级别胶质瘤

Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.样本采集部位和采集程序对严重急性呼吸综合征冠状病毒2（SARS-CoV-2）感染鉴定的影响。

Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.

Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备：证据综合和成本效益分析。

Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.

引用本文的文献

Making causal inferences from transactional data: A narrative review of opportunities and challenges when implementing the target trial framework.从交易数据中进行因果推断：实施目标试验框架时的机遇与挑战述评。

J Int Med Res. 2024 Mar;52(3):3000605241241920. doi: 10.1177/03000605241241920.

Soft phenotyping for sepsis via EHR time-aware soft clustering.基于 EHR 的时间感知软聚类进行脓毒症的软表型分析。

J Biomed Inform. 2024 Apr;152:104615. doi: 10.1016/j.jbi.2024.104615. Epub 2024 Feb 27.

Leveraging electronic health record data for endometriosis research.利用电子健康记录数据进行子宫内膜异位症研究。

Front Digit Health. 2023 Jun 5;5:1150687. doi: 10.3389/fdgth.2023.1150687. eCollection 2023.

Machine learning approaches for electronic health records phenotyping: a methodical review.基于机器学习的电子健康记录表型分析方法：系统评价

J Am Med Inform Assoc. 2023 Jan 18;30(2):367-381. doi: 10.1093/jamia/ocac216.

From real-world electronic health record data to real-world results using artificial intelligence.从真实世界的电子健康记录数据到使用人工智能获得真实世界的结果。

Ann Rheum Dis. 2023 Mar;82(3):306-311. doi: 10.1136/ard-2022-222626. Epub 2022 Sep 23.

What can you do with an electronic health record?你能用电子健康记录做什么？

J Am Med Inform Assoc. 2022 Apr 13;29(5):751-752. doi: 10.1093/jamia/ocac042.

本文引用的文献

Efficient and precise single-cell reference atlas mapping with Symphony.使用 Symphony 进行高效、精确的单细胞参考图谱映射。

Nat Commun. 2021 Oct 7;12(1):5890. doi: 10.1038/s41467-021-25957-x.

Subphenotyping depression using machine learning and electronic health records.使用机器学习和电子健康记录对抑郁症进行亚分型

Learn Health Syst. 2020 Aug 3;4(4):e10241. doi: 10.1002/lrh2.10241. eCollection 2020 Oct.

Deep representation learning of electronic health records to unlock patient stratification at scale.电子健康记录的深度表征学习，以大规模实现患者分层。

NPJ Digit Med. 2020 Jul 17;3:96. doi: 10.1038/s41746-020-0301-z. eCollection 2020.

On classifying sepsis heterogeneity in the ICU: insight using machine learning.在 ICU 中对脓毒症异质性进行分类：使用机器学习的见解。

J Am Med Inform Assoc. 2020 Mar 1;27(3):437-443. doi: 10.1093/jamia/ocz211.

Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records.使用电子健康记录进行无监督机器学习以发现潜在疾病集群和患者亚组。

J Biomed Inform. 2020 Feb;102:103364. doi: 10.1016/j.jbi.2019.103364. Epub 2019 Dec 28.

Patterns of symptoms before a diagnosis of first episode psychosis: a latent class analysis of UK primary care electronic health records.首发精神病诊断前症状模式：英国初级保健电子健康记录的潜在类别分析。

BMC Med. 2019 Dec 4;17(1):227. doi: 10.1186/s12916-019-1462-y.

The art of using t-SNE for single-cell transcriptomics.使用 t-SNE 进行单细胞转录组学分析的艺术。

Nat Commun. 2019 Nov 28;10(1):5416. doi: 10.1038/s41467-019-13056-x.

Fast, sensitive and accurate integration of single-cell data with Harmony.利用 Harmony 实现单细胞数据的快速、灵敏和精确整合。

Nat Methods. 2019 Dec;16(12):1289-1296. doi: 10.1038/s41592-019-0619-0. Epub 2019 Nov 18.

Automated grouping of medical codes via multiview banded spectral clustering.通过多视图带谱聚类自动对医疗代码进行分组。

J Biomed Inform. 2019 Dec;100:103322. doi: 10.1016/j.jbi.2019.103322. Epub 2019 Oct 28.

Current best practices in single-cell RNA-seq analysis: a tutorial.单细胞 RNA 测序分析的当前最佳实践：教程。

Mol Syst Biol. 2019 Jun 19;15(6):e8746. doi: 10.15252/msb.20188746.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用纵向收集的多中心电子健康记录对疾病史异质的患者人群进行分层的框架。

A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history.

机构信息

出版信息

OBJECTIVE

MATERIAL AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料和方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献