Baron Cantin, Cherkaoui Sarah, Therrien-Laperriere Sandra, Ilboudo Yann, Poujol Raphaël, Mehanna Pamela, Garrett Melanie E, Telen Marilyn J, Ashley-Koch Allison E, Bartolucci Pablo, Rioux John D, Lettre Guillaume, Des Rosiers Christine, Ruiz Matthieu, Hussin Julie G
Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Québec, Canada.
Montreal Heart Institute, Québec, Canada.
bioRxiv. 2023 Mar 24:2023.03.22.533869. doi: 10.1101/2023.03.22.533869.
Studies combining metabolomics and genetics, known as metabolite genome-wide association studies (mGWAS), have provided valuable insights into our understanding of the genetic control of metabolite levels. However, the biological interpretation of these associations remains challenging due to a lack of existing tools to annotate mGWAS gene-metabolite pairs beyond the use of conservative statistical significance threshold. Here, we computed the shortest reactional distance (SRD) based on the curated knowledge of the KEGG database to explore its utility in enhancing the biological interpretation of results from three independent mGWAS, including a case study on sickle cell disease patients. Results show that, in reported mGWAS pairs, there is an excess of small SRD values and that SRD values and p-values significantly correlate, even beyond the standard conservative thresholds. The added-value of SRD annotation is shown for identification of potential false negative hits, exemplified by the finding of gene-metabolite associations with SRD ≤1 that did not reach standard genome-wide significance cut-off. The wider use of this statistic as an mGWAS annotation would prevent the exclusion of biologically relevant associations and can also identify errors or gaps in current metabolic pathway databases. Our findings highlight the SRD metric as an objective, quantitative and easy-to-compute annotation for gene-metabolite pairs that can be used to integrate statistical evidence to biological networks.
将代谢组学与遗传学相结合的研究,即代谢物全基因组关联研究(mGWAS),为我们理解代谢物水平的遗传控制提供了宝贵的见解。然而,由于缺乏除使用保守统计显著性阈值之外的注释mGWAS基因-代谢物对的现有工具,这些关联的生物学解释仍然具有挑战性。在这里,我们基于KEGG数据库的精选知识计算了最短反应距离(SRD),以探索其在增强来自三项独立mGWAS结果的生物学解释方面的效用,其中包括一项对镰状细胞病患者的案例研究。结果表明,在已报道的mGWAS对中,存在过多的小SRD值,并且SRD值与p值显著相关,甚至超出了标准保守阈值。SRD注释的附加值体现在识别潜在的假阴性命中上,例如发现SRD≤1的基因-代谢物关联未达到标准全基因组显著性临界值。将此统计量更广泛地用作mGWAS注释可以防止排除生物学相关的关联,还可以识别当前代谢途径数据库中的错误或空白。我们的研究结果突出了SRD指标作为一种客观、定量且易于计算的基因-代谢物对注释,可用于将统计证据整合到生物网络中。