Oshingbesan Adebayo, Kamp Michelle, Mpangase Phelelani Thokozani, Adetunji Kayode, Iddi Samuel, Nderitu Daniel Maina, Akumu Tanya, Achilonu Okechinyere, Kisiangani Isaac, Mathema Theophilous, Tadesse Girmaw, Gomez-Olive F Xavier, Kabudula Chodziwadziwa Whiteson, Hazelhurst Scott, Asiki Gershim, Ramsay Michele, Speakman Skyler
IBM Research Africa, Nairobi, Kenya.
Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
Sci Rep. 2025 Apr 22;15(1):13992. doi: 10.1038/s41598-025-96569-4.
This work provides three contributions that straddle the medical literature on multimorbidity and the data science community with an interest on exploratory analysis of health-related research data. First, we propose a definition for multimorbidity as the co-occurrence of (at least) two disease diagnoses from a pre-determined list. This interpretation adds to a growing body of working definitions emerging from the literature. Second, we apply this novel outcome of-interest to two sub-Saharan populations located in Nairobi, Kenya and Agincourt, South Africa. The source data for this analysis was collected as part of the Africa Wits-INDEPTH Partnership for Genomic Studies project. Third, we stratify this outcome-of-interest across all possible sub-populations and identify sub-populations with anomalously high (or low) rates of multimorbidity. Critically, the automatic stratification approach emphasizes efficient, disciplined exploratory-based analysis as a complementary alternative to more commonly-used confirmation analysis methods. Our results show that high-risk sub-populations identified in one part of the continent transfer to the other location (and vice-versa) with the equivalent sub-population at the other location also experiencing higher rates of multimorbidity. Second, we discover a real-world scenario where a more-at risk sub-population existed beyond the simpler sub-populations traditionally stratified by age and sex. This is in contrast to existing literature which commonly stratifies disease diagnoses by sex when reporting results. Patterns in diseases, and healthcare more generally, are likely more nuanced than manual approaches may be able to describe. This work helps introduce public health researchers to data science methods that scale to the size and complexity of modern day datasets.
这项工作做出了三项贡献,跨越了关于多重疾病的医学文献以及对健康相关研究数据进行探索性分析感兴趣的数据科学界。首先,我们提出了多重疾病的定义,即(至少)两种来自预先确定列表的疾病诊断同时出现。这种解释为文献中不断增加的实用定义增添了内容。其次,我们将这个新的关注结果应用于位于肯尼亚内罗毕和南非阿金库尔的两个撒哈拉以南人群。该分析的源数据是作为非洲维茨 - 深入基因组研究伙伴关系项目的一部分收集的。第三,我们对这个关注结果在所有可能的亚人群中进行分层,并识别出多重疾病发生率异常高(或低)的亚人群。至关重要的是,这种自动分层方法强调高效、规范的基于探索的分析,作为更常用的验证分析方法的补充替代方法。我们的结果表明,在该大陆一个地区识别出的高风险亚人群转移到了另一个地区(反之亦然),另一个地区的同等亚人群也经历了更高的多重疾病发生率。其次,我们发现了一个现实世界的情况,即在传统上按年龄和性别分层的更简单亚人群之外,存在一个风险更高的亚人群。这与现有文献形成对比,现有文献在报告结果时通常按性别对疾病诊断进行分层。疾病模式以及更广泛的医疗保健模式可能比手动方法所能描述的更加细微差别。这项工作有助于向公共卫生研究人员介绍能够适应现代数据集规模和复杂性的数据科学方法。