Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
Bioinformatics. 2022 Jun 27;38(13):3385-3394. doi: 10.1093/bioinformatics/btac356.
Our ability to identify causative genetic factors for mouse genetic models of human diseases and biomedical traits has been limited by the difficulties associated with identifying true causative factors, which are often obscured by the many false positive genetic associations produced by a GWAS.
To accelerate the pace of genetic discovery, we developed a graph neural network (GNN)-based automated pipeline (GNNHap) that could rapidly analyze mouse genetic model data and identify high probability causal genetic factors for analyzed traits. After assessing the strength of allelic associations with the strain response pattern; this pipeline analyzes 29M published papers to assess candidate gene-phenotype relationships; and incorporates the information obtained from a protein-protein interaction network and protein sequence features into the analysis. The GNN model produces markedly improved results relative to that of a simple linear neural network. We demonstrate that GNNHap can identify novel causative genetic factors for murine models of diabetes/obesity and for cataract formation, which were validated by the phenotypes appearing in previously analyzed gene knockout mice. The diabetes/obesity results indicate how characterization of the underlying genetic architecture enables new therapies to be discovered and tested by applying 'precision medicine' principles to murine models.
The GNNHap source code is freely available at https://github.com/zqfang/gnnhap, and the new version of the HBCGM program is available at https://github.com/zqfang/haplomap.
Supplementary data are available at Bioinformatics online.
我们识别人类疾病和生物医学特征的小鼠遗传模型的因果遗传因素的能力受到与识别真正因果因素相关的困难的限制,这些因素通常被 GWAS 产生的许多假阳性遗传关联所掩盖。
为了加速遗传发现的步伐,我们开发了一种基于图神经网络 (GNN) 的自动化管道 (GNNHap),可以快速分析小鼠遗传模型数据并识别分析性状的高概率因果遗传因素。该管道在评估等位基因与菌株反应模式之间关联的强度后;分析了 2900 万篇已发表的论文,以评估候选基因-表型关系;并将从蛋白质-蛋白质相互作用网络和蛋白质序列特征中获得的信息纳入分析。GNN 模型的表现明显优于简单线性神经网络。我们证明 GNNHap 可以识别糖尿病/肥胖和白内障形成的小鼠模型的新因果遗传因素,这些因素通过之前分析的基因敲除小鼠中出现的表型得到了验证。糖尿病/肥胖的结果表明,通过将“精准医学”原则应用于小鼠模型,如何对潜在遗传结构进行特征描述,从而能够发现和测试新的治疗方法。
GNNHap 的源代码可在 https://github.com/zqfang/gnnhap 上免费获得,新版本的 HBCGM 程序可在 https://github.com/zqfang/haplomap 上获得。
补充数据可在生物信息学在线获得。