探索性数据挖掘在亚组队列发现和优先级排序中的应用。

Finding small homogeneous subgroup cohorts in large heterogeneous populations is a critical process for hypothesis development in biomedical research. Concurrent computational approaches are still lacking in robust answers to the question "what hypotheses are likely to be novel and to produce clinically relevant results with well thought-out study designs?" We have developed a novel subgroup discovery method which employs a deep exploratory mining process to slice and dice thousands of potential subpopulations and prioritize potential cohorts based on their explainable contrast patterns and which may provide interventionable insights. We conducted computational experiments on both synthesized data and a clinical autism data set to assess performance quantitatively for coverage of pre-defined cohorts and qualitatively for novel knowledge discovery, respectively. We also conducted a scaling analysis using a distributed computing environment to suggest computational resource needs for when the subpopulation number increases. This work will provide a robust data-driven framework to automatically tailor potential interventions for precision health.

在大型异质人群中发现小型同质亚组队列对于生物医学研究中的假设开发是一个关键过程。目前仍然缺乏强大的计算方法来回答“哪些假设可能是新颖的，并通过精心设计的研究设计产生临床相关的结果？”我们开发了一种新的亚组发现方法，该方法采用深度探索性挖掘过程来对数千个潜在亚群进行切片和切块，并根据其可解释的对比模式对潜在队列进行优先级排序，从而提供可干预的见解。我们在合成数据和临床自闭症数据集上进行了计算实验，分别对预定义队列的覆盖范围进行定量评估，对新颖知识发现进行定性评估。我们还使用分布式计算环境进行了扩展分析，以建议当亚群数量增加时的计算资源需求。这项工作将为自动为精准健康定制潜在干预措施提供一个强大的数据驱动框架。

Exploratory Data Mining for Subgroup Cohort Discoveries and Prioritization.

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献