Suppr超能文献

编号:一种用于在复杂基于人群的数据集定义多变量亚组的统计框架。

Numero: a statistical framework to define multivariable subgroups in complex population-based datasets.

机构信息

Heart Health Theme, South Australian Health and Medical Research Institute, Adelaide, SA, Australia.

School of Biological Sciences, University of Adelaide, Adelaide, SA, Australia.

出版信息

Int J Epidemiol. 2019 Apr 1;48(2):369-374. doi: 10.1093/ije/dyy113.

Abstract

Large-scale epidemiological and population data provide opportunities to identify subgroups of people who are at risk of disease or exposed to adverse environments. Clustering algorithms are popular data-driven tools to identify these subgroups; however, relying exclusively on algorithms may not produce the best results if the dataset does not have a clustered structure. For this reason, we propose a framework (the R-library Numero) that combines the self-organizing map algorithm, permutation analysis for statistical evidence and a final expert-driven subgrouping step. We used Numero to define subgroups in two examples without an obvious clustering structure: a biomedical dataset of kidney disease and another dataset of community-level socioeconomic indicators. We benchmarked the Numero subgroupings against popular clustering algorithms (principal components, K-means and hierarchical clustering). The Numero subgroupings were more intuitive and easier to interpret without losing mathematical quality. Therefore, we expect Numero to be useful for exploratory analyses of population-based epidemiological datasets.

摘要

大规模的流行病学和人口数据为识别疾病风险或暴露于不利环境的人群亚组提供了机会。聚类算法是一种流行的数据驱动工具,用于识别这些亚组;然而,如果数据集没有聚类结构,仅依赖算法可能不会产生最佳结果。出于这个原因,我们提出了一个框架(R 库 Numero),该框架结合了自组织映射算法、用于统计证据的置换分析以及最终由专家驱动的分组步骤。我们使用 Numero 来定义两个没有明显聚类结构的示例中的亚组:一个是肾脏病的生物医学数据集,另一个是社区级社会经济指标数据集。我们将 Numero 的分组与流行的聚类算法(主成分分析、K-均值和层次聚类)进行了基准测试。Numero 的分组更直观,更容易解释,而不会损失数学质量。因此,我们预计 Numero 将对基于人群的流行病学数据集的探索性分析有用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验