IEEE J Biomed Health Inform. 2020 May;24(5):1456-1468. doi: 10.1109/JBHI.2019.2939149. Epub 2019 Sep 5.
Finding small homogeneous subgroup cohorts in large heterogeneous populations is a critical process for hypothesis development in biomedical research. Concurrent computational approaches are still lacking in robust answers to the question "what hypotheses are likely to be novel and to produce clinically relevant results with well thought-out study designs?" We have developed a novel subgroup discovery method which employs a deep exploratory mining process to slice and dice thousands of potential subpopulations and prioritize potential cohorts based on their explainable contrast patterns and which may provide interventionable insights. We conducted computational experiments on both synthesized data and a clinical autism data set to assess performance quantitatively for coverage of pre-defined cohorts and qualitatively for novel knowledge discovery, respectively. We also conducted a scaling analysis using a distributed computing environment to suggest computational resource needs for when the subpopulation number increases. This work will provide a robust data-driven framework to automatically tailor potential interventions for precision health.
在大型异质人群中发现小型同质亚组队列对于生物医学研究中的假设开发是一个关键过程。目前仍然缺乏强大的计算方法来回答“哪些假设可能是新颖的,并通过精心设计的研究设计产生临床相关的结果?”我们开发了一种新的亚组发现方法,该方法采用深度探索性挖掘过程来对数千个潜在亚群进行切片和切块,并根据其可解释的对比模式对潜在队列进行优先级排序,从而提供可干预的见解。我们在合成数据和临床自闭症数据集上进行了计算实验,分别对预定义队列的覆盖范围进行定量评估,对新颖知识发现进行定性评估。我们还使用分布式计算环境进行了扩展分析,以建议当亚群数量增加时的计算资源需求。这项工作将为自动为精准健康定制潜在干预措施提供一个强大的数据驱动框架。