Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA.
Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA; Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN 55455, USA.
Mutat Res Rev Mutat Res. 2023 Jan-Jun;791:108457. doi: 10.1016/j.mrrev.2023.108457. Epub 2023 Mar 23.
Genetic variations are one of the major causes of phenotypic variations between human individuals. Although beneficial as being the substrate of evolution, germline mutations may cause diseases, including Mendelian diseases and complex diseases such as diabetes and heart diseases. Mutations occurring in somatic cells are a main cause of cancer and likely cause age-related phenotypes and other age-related diseases. Because of the high abundance of genetic variations in the human genome, i.e., millions of germline variations per human subject and thousands of additional somatic mutations per cell, it is technically challenging to experimentally verify the function of every possible mutation and their interactions. Significant progress has been made to solve this problem using computational approaches, especially machine learning (ML). Here, we review the progress and achievements made in recent years in this field of research. We classify the computational models in two ways: one according to their prediction goals including protein structural alterations, gene expression changes, and disease risks, and the other according to their methodologies, including non-machine learning methods, classical machine learning methods, and deep neural network methods. For models in each category, we discuss their architecture, prediction accuracy, and potential limitations. This review provides new insights into the applications and future directions of computational approaches in understanding the role of mutations in aging and disease.
遗传变异是人类个体之间表型变异的主要原因之一。虽然作为进化的基础是有益的,但种系突变可能导致疾病,包括孟德尔疾病和糖尿病、心脏病等复杂疾病。体细胞中的突变是癌症的主要原因,并可能导致与年龄相关的表型和其他与年龄相关的疾病。由于人类基因组中遗传变异的高度丰富性,即每个人类个体有上百万种种系变异和每个细胞有数千种额外的体细胞突变,因此实验验证每个可能的突变及其相互作用的功能具有技术挑战性。利用计算方法,特别是机器学习 (ML),在解决这个问题方面取得了重大进展。在这里,我们回顾了近年来在这一研究领域取得的进展和成就。我们根据预测目标将计算模型分为两类:一类是根据其预测目标进行分类,包括蛋白质结构改变、基因表达变化和疾病风险,另一类是根据其方法学进行分类,包括非机器学习方法、经典机器学习方法和深度神经网络方法。对于每个类别的模型,我们讨论了它们的架构、预测准确性和潜在的局限性。本综述为理解突变在衰老和疾病中的作用提供了计算方法应用和未来方向的新见解。