Esnault Cyril, Gadonna May-Line, Queyrel Maxence, Templier Alexandre, Zucker Jean-Daniel
Quinten France, Paris, France.
Sorbonne University, IRD, UMMISCO, Bondy, France.
Front Artif Intell. 2020 Dec 17;3:559927. doi: 10.3389/frai.2020.559927. eCollection 2020.
Addressing the heterogeneity of both the outcome of a disease and the treatment response to an intervention is a mandatory pathway for regulatory approval of medicines. In randomized clinical trials (RCTs), confirmatory subgroup analyses focus on the assessment of drugs in predefined subgroups, while exploratory ones allow a posteriori the identification of subsets of patients who respond differently. Within the latter area, subgroup discovery (SD) data mining approach is widely used-particularly in precision medicine-to evaluate treatment effect across different groups of patients from various data sources (be it from clinical trials or real-world data). However, both the limited consideration by standard SD algorithms of recommended criteria to define credible subgroups and the lack of statistical power of the findings after correcting for multiple testing hinder the generation of hypothesis and their acceptance by healthcare authorities and practitioners. In this paper, we present the Q-Finder algorithm that aims to generate statistically credible subgroups to answer clinical questions, such as finding drivers of natural disease progression or treatment response. It combines an exhaustive search with a cascade of filters based on metrics assessing key credibility criteria, including relative risk reduction assessment, adjustment on confounding factors, individual feature's contribution to the subgroup's effect, interaction tests for assessing between-subgroup treatment effect interactions and tests adjustment (multiple testing). This allows Q-Finder to directly target and assess subgroups on recommended credibility criteria. The top-k credible subgroups are then selected, while accounting for subgroups' diversity and, possibly, clinical relevance. Those subgroups are tested on independent data to assess their consistency across databases, while preserving statistical power by limiting the number of tests. To illustrate this algorithm, we applied it on the database of the International Diabetes Management Practice Study (IDMPS) to better understand the drivers of improved glycemic control and rate of episodes of hypoglycemia in type 2 diabetics patients. We compared Q-Finder with state-of-the-art approaches from both Subgroup Identification and Knowledge Discovery in Databases literature. The results demonstrate its ability to identify and support a short list of highly credible and diverse data-driven subgroups for both prognostic and predictive tasks.
解决疾病结局和干预治疗反应的异质性是药物获得监管批准的必经之路。在随机临床试验(RCT)中,验证性亚组分析侧重于评估预定义亚组中的药物,而探索性亚组分析允许事后识别反应不同的患者亚组。在后一领域,亚组发现(SD)数据挖掘方法被广泛使用,尤其是在精准医学中,以评估来自各种数据源(无论是临床试验数据还是真实世界数据)的不同患者组的治疗效果。然而,标准SD算法对定义可信亚组的推荐标准考虑有限,以及在多重检验校正后结果缺乏统计效力,这阻碍了假设的产生及其被医疗当局和从业者接受。在本文中,我们提出了Q-Finder算法,其旨在生成具有统计可信度的亚组,以回答临床问题,例如找出自然疾病进展或治疗反应的驱动因素。它将穷举搜索与基于评估关键可信度标准的指标的一系列过滤器相结合,这些标准包括相对风险降低评估、混杂因素调整、个体特征对亚组效应的贡献、评估亚组间治疗效应相互作用的交互检验以及检验调整(多重检验)。这使得Q-Finder能够直接根据推荐的可信度标准针对和评估亚组。然后选择排名前k的可信亚组,同时考虑亚组的多样性以及可能的临床相关性。这些亚组在独立数据上进行测试,以评估它们在不同数据库中的一致性,同时通过限制检验次数来保持统计效力。为了说明该算法,我们将其应用于国际糖尿病管理实践研究(IDMPS)的数据库,以更好地了解2型糖尿病患者血糖控制改善和低血糖发作率的驱动因素。我们将Q-Finder与来自亚组识别和数据库知识发现文献中的最先进方法进行了比较。结果表明,它有能力识别并支持一小部分高度可信且多样的数据驱动亚组,用于预后和预测任务。