Suppr超能文献

用于使用来自异质人群的大规模数据进行临床风险预测的可扩展贝叶斯非参数方法。

Scalable Bayesian Nonparametric Method for Clinical Risk Prediction Using Large-Scale Data from Heterogeneous Populations.

作者信息

Dong Ning, Nair Nandini, Du Dongping

出版信息

IEEE J Biomed Health Inform. 2025 May 7;PP. doi: 10.1109/JBHI.2025.3567944.

Abstract

While analyzing large clinical datasets allows for the identification of complex patterns to achieve increased risk prediction accuracy, it also presents challenges for existing risk modeling techniques due to patient heterogeneity and the ever-evolving volume and distributions of data. Bayesian nonparametric methods, such as the Dirichlet Process Mixture Model (DPMM), offer a promising solution for modeling data with mixed and overlapping distributions. However, the approach is computationally prohibitive when applied to large datasets, which greatly limits practical applications. In this study, we propose a scalable framework for efficiently constructing DPMMs from large clinical datasets. To improve computational efficiency, we divide the full dataset into smaller subsets and learn DPMMs within individual sets. Additionally, we adopt a recentered pseudo-barycenter to approximate the posterior density of the full dataset and design a new algorithm to generate a consistent clustering rule from the subset posteriors with unequal numbers of components. The method was validated through a simulation study and a case study predicting the survival of heart failure patients post-left ventricular assist device implantation. The results demonstrated improved accuracy compared to benchmark models such as the Cox proportional hazards model and random survival forests. Our modeling framework adaptively clusters patients with distinct risk profiles into subgroups and predicts their probabilities of developing adverse events from overlapping posterior mixtures, providing an effective approach for addressing patient heterogeneity and enhancing risk prediction accuracy.

摘要

虽然分析大型临床数据集有助于识别复杂模式以提高风险预测准确性,但由于患者的异质性以及数据量和分布的不断变化,这也给现有的风险建模技术带来了挑战。贝叶斯非参数方法,如狄利克雷过程混合模型(DPMM),为混合和重叠分布的数据建模提供了一个有前景的解决方案。然而,当应用于大型数据集时,该方法的计算成本过高,这极大地限制了其实际应用。在本研究中,我们提出了一个可扩展的框架,用于从大型临床数据集高效构建DPMM。为了提高计算效率,我们将完整数据集划分为较小的子集,并在各个子集中学习DPMM。此外,我们采用重新定位的伪重心来近似完整数据集的后验密度,并设计了一种新算法,从具有不等数量组件的子集后验中生成一致的聚类规则。该方法通过模拟研究和预测左心室辅助装置植入术后心力衰竭患者生存率的案例研究进行了验证。结果表明,与Cox比例风险模型和随机生存森林等基准模型相比,该方法的准确性有所提高。我们的建模框架将具有不同风险特征的患者自适应地聚类到亚组中,并从重叠的后验混合中预测他们发生不良事件的概率,为解决患者异质性和提高风险预测准确性提供了一种有效方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验