Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa 236-0004, Japan; Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa 236-0004, Japan.
Med. 2021 May 14;2(5):611-632.e9. doi: 10.1016/j.medj.2021.02.003. Epub 2021 Mar 11.
Although the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines for variant interpretation are used widely in clinical genetics, there is room for improvement of these knowledge-based guidelines.
Statistical assessment of average deleteriousness of start-lost, stop-lost, and in-frame insertion and deletion (indel) variants and extraction of deleterious subsets was performed, being informed by proportions of rare variants in the general population of the Genome Aggregation Database (gnomAD). A machine learning-based model scoring the pathogenicity of start-lost variants (the PoStaL model) was constructed by predicting possible translation initiation sites on transcripts by deep learning and training a random forest on known pathogenic and likely benign variants.
The proportion of rare variants was highest in stop-lost variants, followed by in-frame indels and start-lost variants, suggesting that the criteria in the ACMG/AMP guidelines assigning PVS (pathogenic very strong) to start-lost variants and PM (pathogenic moderate) to stop-lost and in-frame indel variants would not be appropriate. Regarding deleterious subsets, stop-lost variants introducing extensions of more than 30 amino acids and in-frame indels computationally predicted to be damaging are enriched for rare and known pathogenic variants. For start-lost variants, we developed the PoStaL model, which outperforms existing tools. We also provide comprehensive lists of the PoStaL scores for start-lost variants and the length of extended amino acids by stop-lost variants.
Our study could contribute to refinement of the ACMG/AMP guidelines, provides resources for future investigation, and provides an example of how to improve knowledge-based frameworks by data-driven approaches.
The study was supported by grants from the Japan Agency for Medical Research and Development (AMED) and the Japan Society for the Promotion of Science (JSPS).
尽管美国医学遗传学与基因组学学会/分子病理学协会(ACMG/AMP)的变异解读指南在临床遗传学中得到广泛应用,但这些基于知识的指南仍有改进的空间。
通过对基因组聚集数据库(gnomAD)中一般人群罕见变异的比例进行统计评估,对起始丢失、终止丢失和框架内插入和缺失(indel)变异的平均有害性进行了评估,并提取了有害亚组。基于机器学习的起始丢失变异致病性评分模型(PoStaL 模型)是通过深度学习预测转录本上可能的翻译起始位点,并在已知致病性和可能良性变异上训练随机森林构建的。
罕见变异的比例在终止丢失变异中最高,其次是框架内 indel 和起始丢失变异,这表明 ACMG/AMP 指南将起始丢失变异分配为 PVS(致病性很强),将终止丢失和框架内 indel 变异分配为 PM(致病性中等)的标准可能不合适。关于有害亚组,终止丢失变异引入的氨基酸延伸超过 30 个,以及计算预测的框架内 indel 是有害的,这些变异中罕见和已知致病性变异富集。对于起始丢失变异,我们开发了 PoStaL 模型,该模型的性能优于现有工具。我们还提供了起始丢失变异的 PoStaL 评分和终止丢失变异的延长氨基酸长度的综合列表。
我们的研究有助于 ACMG/AMP 指南的完善,为未来的研究提供了资源,并为如何通过数据驱动方法改进基于知识的框架提供了一个范例。
该研究得到了日本医疗研究与发展机构(AMED)和日本科学促进会(JSPS)的资助。