Department of Human Genetics, McGill University, Montreal, QC, Canada.
Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC, Canada.
BMC Psychiatry. 2020 Feb 28;20(1):92. doi: 10.1186/s12888-020-02503-5.
Machine learning (ML) algorithms and methods offer great tools to analyze large complex genomic datasets. Our goal was to compare the genomic architecture of schizophrenia (SCZ) and autism spectrum disorder (ASD) using ML.
In this paper, we used regularized gradient boosted machines to analyze whole-exome sequencing (WES) data from individuals SCZ and ASD in order to identify important distinguishing genetic features. We further demonstrated a method of gene clustering to highlight which subsets of genes identified by the ML algorithm are mutated concurrently in affected individuals and are central to each disease (i.e., ASD vs. SCZ "hub" genes).
In summary, after correcting for population structure, we found that SCZ and ASD cases could be successfully separated based on genetic information, with 86-88% accuracy on the testing dataset. Through bioinformatic analysis, we explored if combinations of genes concurrently mutated in patients with the same condition ("hub" genes) belong to specific pathways. Several themes were found to be associated with ASD, including calcium ion transmembrane transport, immune system/inflammation, synapse organization, and retinoid metabolic process. Moreover, ion transmembrane transport, neurotransmitter transport, and microtubule/cytoskeleton processes were highlighted for SCZ.
Our manuscript introduces a novel comparative approach for studying the genetic architecture of genetically related diseases with complex inheritance and highlights genetic similarities and differences between ASD and SCZ.
机器学习 (ML) 算法和方法为分析大型复杂基因组数据集提供了很好的工具。我们的目标是使用 ML 比较精神分裂症 (SCZ) 和自闭症谱系障碍 (ASD) 的基因组结构。
在本文中,我们使用正则化梯度提升机来分析 SCZ 和 ASD 个体的全外显子测序 (WES) 数据,以识别重要的区分遗传特征。我们进一步展示了一种基因聚类方法,以突出由 ML 算法识别的基因子集在受影响个体中同时发生突变,并成为每种疾病的核心(即 ASD 与 SCZ“枢纽”基因)。
总之,在纠正了群体结构后,我们发现可以基于遗传信息成功地将 SCZ 和 ASD 病例分开,在测试数据集上的准确率为 86-88%。通过生物信息学分析,我们探讨了患有相同疾病的患者中同时突变的基因组合(“枢纽”基因)是否属于特定途径。发现了一些与 ASD 相关的主题,包括钙离子跨膜转运、免疫系统/炎症、突触组织和视黄醇代谢过程。此外,还强调了 SCZ 离子跨膜转运、神经递质转运和微管/细胞骨架过程。
我们的手稿介绍了一种研究具有复杂遗传的相关疾病遗传结构的新颖比较方法,并强调了 ASD 和 SCZ 之间的遗传相似性和差异。