Suppr超能文献

基于统计证据和机器学习的临床变异体解释框架的精细化。

Refinement of the clinical variant interpretation framework by statistical evidence and machine learning.

机构信息

Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa 236-0004, Japan; Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.

Department of Human Genetics, Yokohama City University Graduate School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa 236-0004, Japan.

出版信息

Med. 2021 May 14;2(5):611-632.e9. doi: 10.1016/j.medj.2021.02.003. Epub 2021 Mar 11.

Abstract

BACKGROUND

Although the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines for variant interpretation are used widely in clinical genetics, there is room for improvement of these knowledge-based guidelines.

METHODS

Statistical assessment of average deleteriousness of start-lost, stop-lost, and in-frame insertion and deletion (indel) variants and extraction of deleterious subsets was performed, being informed by proportions of rare variants in the general population of the Genome Aggregation Database (gnomAD). A machine learning-based model scoring the pathogenicity of start-lost variants (the PoStaL model) was constructed by predicting possible translation initiation sites on transcripts by deep learning and training a random forest on known pathogenic and likely benign variants.

FINDINGS

The proportion of rare variants was highest in stop-lost variants, followed by in-frame indels and start-lost variants, suggesting that the criteria in the ACMG/AMP guidelines assigning PVS (pathogenic very strong) to start-lost variants and PM (pathogenic moderate) to stop-lost and in-frame indel variants would not be appropriate. Regarding deleterious subsets, stop-lost variants introducing extensions of more than 30 amino acids and in-frame indels computationally predicted to be damaging are enriched for rare and known pathogenic variants. For start-lost variants, we developed the PoStaL model, which outperforms existing tools. We also provide comprehensive lists of the PoStaL scores for start-lost variants and the length of extended amino acids by stop-lost variants.

CONCLUSIONS

Our study could contribute to refinement of the ACMG/AMP guidelines, provides resources for future investigation, and provides an example of how to improve knowledge-based frameworks by data-driven approaches.

FUNDING

The study was supported by grants from the Japan Agency for Medical Research and Development (AMED) and the Japan Society for the Promotion of Science (JSPS).

摘要

背景

尽管美国医学遗传学与基因组学学会/分子病理学协会(ACMG/AMP)的变异解读指南在临床遗传学中得到广泛应用,但这些基于知识的指南仍有改进的空间。

方法

通过对基因组聚集数据库(gnomAD)中一般人群罕见变异的比例进行统计评估,对起始丢失、终止丢失和框架内插入和缺失(indel)变异的平均有害性进行了评估,并提取了有害亚组。基于机器学习的起始丢失变异致病性评分模型(PoStaL 模型)是通过深度学习预测转录本上可能的翻译起始位点,并在已知致病性和可能良性变异上训练随机森林构建的。

发现

罕见变异的比例在终止丢失变异中最高,其次是框架内 indel 和起始丢失变异,这表明 ACMG/AMP 指南将起始丢失变异分配为 PVS(致病性很强),将终止丢失和框架内 indel 变异分配为 PM(致病性中等)的标准可能不合适。关于有害亚组,终止丢失变异引入的氨基酸延伸超过 30 个,以及计算预测的框架内 indel 是有害的,这些变异中罕见和已知致病性变异富集。对于起始丢失变异,我们开发了 PoStaL 模型,该模型的性能优于现有工具。我们还提供了起始丢失变异的 PoStaL 评分和终止丢失变异的延长氨基酸长度的综合列表。

结论

我们的研究有助于 ACMG/AMP 指南的完善,为未来的研究提供了资源,并为如何通过数据驱动方法改进基于知识的框架提供了一个范例。

资助

该研究得到了日本医疗研究与发展机构(AMED)和日本科学促进会(JSPS)的资助。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验