Vasilopoulou Christina, Morris Andrew P, Giannakopoulos George, Duguez Stephanie, Duddy William
Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry BT47 6SB, UK.
Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, University of Manchester, Manchester M13 9PT, UK.
J Pers Med. 2020 Nov 26;10(4):247. doi: 10.3390/jpm10040247.
Amyotrophic Lateral Sclerosis (ALS) is the most common late-onset motor neuron disorder, but our current knowledge of the molecular mechanisms and pathways underlying this disease remain elusive. This review (1) systematically identifies machine learning studies aimed at the understanding of the genetic architecture of ALS, (2) outlines the main challenges faced and compares the different approaches that have been used to confront them, and (3) compares the experimental designs and results produced by those approaches and describes their reproducibility in terms of biological results and the performances of the machine learning models. The majority of the collected studies incorporated prior knowledge of ALS into their feature selection approaches, and trained their machine learning models using genomic data combined with other types of mined knowledge including functional associations, protein-protein interactions, disease/tissue-specific information, epigenetic data, and known ALS phenotype-genotype associations. The importance of incorporating gene-gene interactions and cis-regulatory elements into the experimental design of future ALS machine learning studies is highlighted. Lastly, it is suggested that future advances in the genomic and machine learning fields will bring about a better understanding of ALS genetic architecture, and enable improved personalized approaches to this and other devastating and complex diseases.
肌萎缩侧索硬化症(ALS)是最常见的晚发性运动神经元疾病,但我们目前对该疾病潜在的分子机制和途径的了解仍然有限。这篇综述(1)系统地识别旨在理解ALS遗传结构的机器学习研究,(2)概述所面临的主要挑战,并比较用于应对这些挑战的不同方法,以及(3)比较这些方法产生的实验设计和结果,并从生物学结果和机器学习模型的性能方面描述其可重复性。大多数收集到的研究在其特征选择方法中纳入了ALS的先验知识,并使用基因组数据结合其他类型的挖掘知识(包括功能关联、蛋白质-蛋白质相互作用、疾病/组织特异性信息、表观遗传数据以及已知的ALS表型-基因型关联)来训练其机器学习模型。强调了在未来ALS机器学习研究的实验设计中纳入基因-基因相互作用和顺式调控元件的重要性。最后,有人提出基因组学和机器学习领域的未来进展将带来对ALS遗传结构的更好理解,并能够改进针对这种以及其他毁灭性复杂疾病的个性化方法。