• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用多地点电子健康数据进行亚型特征分析:N3C临床租户中痴呆症的一项试点研究。

Leveraging multi-site electronic health data for characterization of subtypes: a pilot study of dementia in the N3C Clinical Tenant.

作者信息

Sharma Suchetha, Liu Jiebei, Abramowitz Amy Caroline, Geary Carol Reynolds, Johnston Karen C, Manning Carol, Van Horn John Darrell, Zhou Andrea, Anzalone Alfred J, Loomba Johanna, Pfaff Emily, Brown Don

机构信息

School of Data Science, University of Virginia, Charlottesville, VA 22903, United States.

Department of Systems Engineering, University of Virginia, Charlottesville, VA 22904, United States.

出版信息

JAMIA Open. 2024 Aug 6;7(3):ooae076. doi: 10.1093/jamiaopen/ooae076. eCollection 2024 Oct.

DOI:10.1093/jamiaopen/ooae076
PMID:39132679
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11316614/
Abstract

OBJECTIVES

To provide a foundational methodology for differentiating comorbidity patterns in subphenotypes through investigation of a multi-site dementia patient dataset.

MATERIALS AND METHODS

Employing the National Clinical Cohort Collaborative Tenant Pilot (N3C Clinical) dataset, our approach integrates machine learning algorithms-logistic regression and eXtreme Gradient Boosting (XGBoost)-with a diagnostic hierarchical model for nuanced classification of dementia subtypes based on comorbidities and gender. The methodology is enhanced by multi-site EHR data, implementing a hybrid sampling strategy combining 65% Synthetic Minority Over-sampling Technique (SMOTE), 35% Random Under-Sampling (RUS), and Tomek Links for class imbalance. The hierarchical model further refines the analysis, allowing for layered understanding of disease patterns.

RESULTS

The study identified significant comorbidity patterns associated with diagnosis of Alzheimer's, Vascular, and Lewy Body dementia subtypes. The classification models achieved accuracies up to 69% for Alzheimer's/Vascular dementia and highlighted challenges in distinguishing Dementia with Lewy Bodies. The hierarchical model elucidates the complexity of diagnosing Dementia with Lewy Bodies and reveals the potential impact of regional clinical practices on dementia classification.

CONCLUSION

Our methodology underscores the importance of leveraging multi-site datasets and tailored sampling techniques for dementia research. This framework holds promise for extending to other disease subtypes, offering a pathway to more nuanced and generalizable insights into dementia and its complex interplay with comorbid conditions.

DISCUSSION

This study underscores the critical role of multi-site data analyzes in understanding the relationship between comorbidities and disease subtypes. By utilizing diverse healthcare data, we emphasize the need to consider site-specific differences in clinical practices and patient demographics. Despite challenges like class imbalance and variability in EHR data, our findings highlight the essential contribution of multi-site data to developing accurate and generalizable models for disease classification.

摘要

目的

通过对多中心痴呆患者数据集的研究,提供一种区分亚表型中共病模式的基础方法。

材料与方法

利用国家临床队列协作租户试点(N3C临床)数据集,我们的方法将机器学习算法——逻辑回归和极端梯度提升(XGBoost)与诊断层次模型相结合,以便根据共病情况和性别对痴呆亚型进行细致分类。该方法通过多中心电子健康记录(EHR)数据得到增强,实施了一种混合抽样策略,结合了65%的合成少数过采样技术(SMOTE)、35%的随机欠采样(RUS)以及用于解决类别不平衡问题的托梅克链接(Tomek Links)。层次模型进一步完善了分析,使我们能够分层理解疾病模式。

结果

该研究确定了与阿尔茨海默病、血管性痴呆和路易体痴呆亚型诊断相关的显著共病模式。分类模型对阿尔茨海默病/血管性痴呆的准确率高达69%,并突出了区分路易体痴呆的挑战。层次模型阐明了路易体痴呆诊断的复杂性,并揭示了区域临床实践对痴呆分类的潜在影响。

结论

我们的方法强调了利用多中心数据集和量身定制的抽样技术进行痴呆研究的重要性。这个框架有望扩展到其他疾病亚型,为更细致、更具普遍性地洞察痴呆及其与共病状况的复杂相互作用提供一条途径。

讨论

本研究强调了多中心数据分析在理解共病与疾病亚型之间关系方面的关键作用。通过利用多样化的医疗保健数据,我们强调需要考虑临床实践和患者人口统计学方面的特定地点差异。尽管存在类别不平衡和EHR数据变异性等挑战,但我们的研究结果突出了多中心数据对开发准确且具有普遍性的疾病分类模型的重要贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/a6bb38e4b772/ooae076f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/9591a05e0139/ooae076f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/251fd8a3ed01/ooae076f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/2e3bf7046036/ooae076f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/b5918c4f8d2c/ooae076f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/a6bb38e4b772/ooae076f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/9591a05e0139/ooae076f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/251fd8a3ed01/ooae076f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/2e3bf7046036/ooae076f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/b5918c4f8d2c/ooae076f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e5c/11316614/a6bb38e4b772/ooae076f5.jpg

相似文献

1
Leveraging multi-site electronic health data for characterization of subtypes: a pilot study of dementia in the N3C Clinical Tenant.利用多地点电子健康数据进行亚型特征分析:N3C临床租户中痴呆症的一项试点研究。
JAMIA Open. 2024 Aug 6;7(3):ooae076. doi: 10.1093/jamiaopen/ooae076. eCollection 2024 Oct.
2
MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟:一、入组、临床、液体方案。
Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.
3
Identification of Orphan Genes in Unbalanced Datasets Based on Ensemble Learning.基于集成学习的不平衡数据集中孤儿基因的识别
Front Genet. 2020 Oct 2;11:820. doi: 10.3389/fgene.2020.00820. eCollection 2020.
4
Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.基于 FHIR 的电子健康记录表型框架的开发:以从出院小结中识别肥胖且伴有多种合并症的患者为例。
J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.
5
Data-driven discovery of probable Alzheimer's disease and related dementia subphenotypes using electronic health records.利用电子健康记录进行数据驱动的阿尔茨海默病及相关痴呆亚型的可能发现。
Learn Health Syst. 2020 Sep 10;4(4):e10246. doi: 10.1002/lrh2.10246. eCollection 2020 Oct.
6
Investigating perioperative pressure injuries and factors influencing them with imbalanced samples using a Synthetic Minority Over-sampling Technique.使用合成少数过采样技术对围手术期压力性损伤及其影响因素进行不均衡样本研究。
Biosci Trends. 2025 May 9;19(2):173-188. doi: 10.5582/bst.2025.01013. Epub 2025 Apr 15.
7
A unified approach for Parkinson's disease recognition: imbalance mitigation and grid search optimized boosting with LightGBM.一种帕金森病识别的统一方法:不平衡缓解和网格搜索优化的 LightGBM 提升。
Med Biol Eng Comput. 2024 Nov;62(11):3471-3491. doi: 10.1007/s11517-024-03139-3. Epub 2024 Jun 14.
8
The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction.国家新冠队列协作组:临床特征与早期严重程度预测
medRxiv. 2021 Jan 23:2021.01.12.21249511. doi: 10.1101/2021.01.12.21249511.
9
Efficient Explainable Models for Alzheimer's Disease Classification with Feature Selection and Data Balancing Approach Using Ensemble Learning.基于集成学习的特征选择和数据平衡方法的阿尔茨海默病分类高效可解释模型
Diagnostics (Basel). 2024 Dec 10;14(24):2770. doi: 10.3390/diagnostics14242770.
10
Predicting Nurse Turnover for Highly Imbalanced Data Using the Synthetic Minority Over-Sampling Technique and Machine Learning Algorithms.使用合成少数过采样技术和机器学习算法预测高度不平衡数据中的护士离职率。
Healthcare (Basel). 2023 Dec 15;11(24):3173. doi: 10.3390/healthcare11243173.

引用本文的文献

1
Thinking About It All Together: A Descriptive Analysis to Understand Comorbidities in People Living With Dementia.综合考量:一项旨在了解痴呆症患者共病情况的描述性分析
Health Sci Rep. 2025 Feb 5;8(2):e70449. doi: 10.1002/hsr2.70449. eCollection 2025 Feb.

本文引用的文献

1
PyMC: a modern, and comprehensive probabilistic programming framework in Python.PyMC:Python 中一个现代且全面的概率编程框架。
PeerJ Comput Sci. 2023 Sep 1;9:e1516. doi: 10.7717/peerj-cs.1516. eCollection 2023.
2
Dementia and electronic health record phenotypes: a scoping review of available phenotypes and opportunities for future research.痴呆症和电子健康记录表型:现有表型及其未来研究机会的范围综述。
J Am Med Inform Assoc. 2023 Jun 20;30(7):1333-1348. doi: 10.1093/jamia/ocad086.
3
Multimorbidity pattern and risk of dementia in later life: an 11-year follow-up study using a large community cohort and linked electronic health records.
晚年的多重疾病模式与痴呆风险:一项使用大型社区队列和关联电子健康记录的11年随访研究。
J Epidemiol Community Health. 2023 May;77(5):285-292. doi: 10.1136/jech-2022-220034. Epub 2023 Mar 8.
4
Machine learning approaches for electronic health records phenotyping: a methodical review.基于机器学习的电子健康记录表型分析方法:系统评价
J Am Med Inform Assoc. 2023 Jan 18;30(2):367-381. doi: 10.1093/jamia/ocac216.
5
Sex influences clinical phenotype in frontotemporal dementia.性别影响额颞叶痴呆的临床表型。
Neurol Sci. 2022 Sep;43(9):5281-5287. doi: 10.1007/s10072-022-06185-7. Epub 2022 Jun 8.
6
2022 Alzheimer's disease facts and figures.2022 年阿尔茨海默病事实和数据。
Alzheimers Dement. 2022 Apr;18(4):700-789. doi: 10.1002/alz.12638. Epub 2022 Mar 14.
7
Enhancing PCORnet Clinical Research Network data completeness by integrating multistate insurance claims with electronic health records in a cloud environment aligned with CMS security and privacy requirements.在符合 CMS 安全和隐私要求的云环境中,通过将多州保险索赔与电子健康记录相集成,提高 PCORnet 临床研究网络数据的完整性。
J Am Med Inform Assoc. 2022 Mar 15;29(4):660-670. doi: 10.1093/jamia/ocab269.
8
Concept libraries for automatic electronic health record based phenotyping: A review.基于自动电子健康记录的表型概念库:综述。
Int J Popul Data Sci. 2021 Jun 16;6(1):1362. doi: 10.23889/ijpds.v5i1.1362.
9
Data-driven discovery of probable Alzheimer's disease and related dementia subphenotypes using electronic health records.利用电子健康记录进行数据驱动的阿尔茨海默病及相关痴呆亚型的可能发现。
Learn Health Syst. 2020 Sep 10;4(4):e10246. doi: 10.1002/lrh2.10246. eCollection 2020 Oct.
10
The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.国家 COVID 队列协作组织(N3C):原理、设计、基础设施和部署。
J Am Med Inform Assoc. 2021 Mar 1;28(3):427-443. doi: 10.1093/jamia/ocaa196.