Ghose Upamanyu, Sproviero William, Winchester Laura, Amin Najaf, Zhu Taiyu, Newby Danielle, Ulm Brittany S, Papathanasiou Angeliki, Shi Liu, Liu Qiang, Fernandes Marco, Adams Cassandra, Albukhari Ashwag, Almansouri Majid, Choudhry Hani, van Duijn Cornelia, Nevado-Holgado Alejo
Department of Psychiatry, University of Oxford, Oxford, United Kingdom.
King Abdulaziz University and the University of Oxford Centre for Artificial Intelligence in Precision Medicine (KO-CAIPM), Jeddah, Saudi Arabia.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae704.
Augmenting traditional genome-wide association studies (GWAS) with advanced machine learning algorithms can allow the detection of novel signals in available cohorts. We introduce "genome-wide association neural networks (GWANN)" a novel approach that uses neural networks (NNs) to perform a gene-level association study with family history of Alzheimer's disease (AD). In UK Biobank, we defined cases (n = 42 110) as those with AD or family history of AD and sampled an equal number of controls. The data was split into an 80:20 ratio of training and testing samples, and GWANN was trained on the former followed by identifying associated genes using its performance on the latter. Our method identified 18 genes to be associated with family history of AD. APOE, BIN1, SORL1, ADAM10, APH1B, and SPI1 have been identified by previous AD GWAS. Among the 12 new genes, PCDH9, NRG3, ROR1, LINGO2, SMYD3, and LRRC7 have been associated with neurofibrillary tangles or phosphorylated tau in previous studies. Furthermore, there is evidence for differential transcriptomic or proteomic expression between AD and healthy brains for 10 of the 12 new genes. A series of post hoc analyses resulted in a significantly enriched protein-protein interaction network (P-value < 1 × 10-16), and enrichment of relevant disease and biological pathways such as focal adhesion (P-value = 1 × 10-4), extracellular matrix organization (P-value = 1 × 10-4), Hippo signaling (P-value = 7 × 10-4), Alzheimer's disease (P-value = 3 × 10-4), and impaired cognition (P-value = 4 × 10-3). Applying NNs for GWAS illustrates their potential to complement existing algorithms and methods and enable the discovery of new associations without the need to expand existing cohorts.
用先进的机器学习算法增强传统的全基因组关联研究(GWAS),可以在现有队列中检测到新的信号。我们引入了“全基因组关联神经网络(GWANN)”,这是一种使用神经网络(NN)对阿尔茨海默病(AD)家族史进行基因水平关联研究的新方法。在英国生物银行中,我们将病例(n = 42110)定义为患有AD或有AD家族史的人,并抽取了相同数量的对照。数据按80:20的比例分为训练样本和测试样本,GWANN在前一组样本上进行训练,然后根据其在后者上的表现识别相关基因。我们的方法确定了18个与AD家族史相关的基因。先前的AD GWAS已经鉴定出APOE、BIN1、SORL1、ADAM10、APH1B和SPI1。在这12个新基因中,PCDH9、NRG3、ROR1、LINGO2、SMYD3和LRRC7在先前的研究中已与神经原纤维缠结或磷酸化tau相关。此外,在这12个新基因中的10个基因中,有证据表明AD与健康大脑之间存在差异转录组或蛋白质组表达。一系列事后分析产生了一个显著富集的蛋白质-蛋白质相互作用网络(P值<1×10-16),以及相关疾病和生物途径的富集,如粘着斑(P值 = 1×10-4)、细胞外基质组织(P值 = 1×10-4)、Hippo信号通路(P值 = 7×10-4)、阿尔茨海默病(P值 = 3×10-4)和认知障碍(P值 = 4×10-3)。将神经网络应用于GWAS说明了它们有潜力补充现有算法和方法,并能够在无需扩大现有队列的情况下发现新的关联。