Williams Nick
The Lister Hill National Center for Biomedical Communications, National Library of Medicine, USA.
AMIA Annu Symp Proc. 2025 May 22;2024:1235-1244. eCollection 2024.
Evidence based medicine and health data for policy should update statistical data modeling methods to take advantage of at-scale data. One challenge with at-scale data is information segmentation for clinical presentation discovery and cohort assignment. We use gradient boosting machine (GBM) to segment 94,379,175,015 diagnostic clinical events attributable to 283,632,789 Centers for Medicare and Medicaid Services beneficiaries over 22 observation years. Diagnostic events were aggregated into attack rates by demography and Phenome-wide association studies (PheWas) codes. Resulting attack rates were then segmented within a user defined clinical status of interest, in this case HIV status. 1,753,647 HIV+ beneficiaries were considered. The GBM model assigned 19,651,408 PheWas attack rates from 69,133,296 ICD attack rates into phenogroups/nodes.
基于证据的医学和用于政策制定的健康数据应更新统计数据建模方法,以利用大规模数据。大规模数据面临的一个挑战是用于临床表现发现和队列分配的信息分割。我们使用梯度提升机(GBM)对94379175015个诊断临床事件进行分割,这些事件归因于283632789名医疗保险和医疗补助服务中心的受益人在22个观察年中的情况。诊断事件按人口统计学和全表型关联研究(PheWas)代码汇总为发病率。然后,在用户定义的感兴趣的临床状态(在本例中为HIV状态)内对得出的发病率进行分割。研究考虑了1753647名HIV阳性受益人。GBM模型将来自69133296个ICD发病率的19651408个PheWas发病率分配到表型组/节点中。