Research Centre for Medical Genetics, Moscow, Russia.
Laboratoire de Biologie Structurale de la Cellule, École Polytechnique, Paris, France.
Nucleic Acids Res. 2023 Feb 22;51(3):1229-1244. doi: 10.1093/nar/gkac1247.
An increasing number of studies emphasize the role of non-coding variants in the development of hereditary diseases. However, the interpretation of such variants in clinical genetic testing still remains a critical challenge due to poor knowledge of their pathogenicity mechanisms. It was previously shown that variants in 5'-untranslated regions (5'UTRs) can lead to hereditary diseases due to disruption of upstream open reading frames (uORFs). Here, we performed a manual annotation of upstream translation initiation sites (TISs) in human disease-associated genes from the OMIM database and revealed ∼4.7 thousand of TISs related to uORFs. We compared our TISs with the previous studies and provided a list of 'high confidence' uORFs. Using a luciferase assay, we experimentally validated the translation of uORFs in the ETFDH, PAX9, MAST1, HTT, TTN,GLI2 and COL2A1 genes, as well as existence of N-terminal CDS extension in the ZIC2 gene. Besides, we created a tool to annotate the effects of genetic variants located in uORFs. We revealed the variants from the HGMD and ClinVar databases that disrupt uORFs and thereby could lead to Mendelian disorders. We also showed that the distribution of uORFs-affecting variants differs between pathogenic and population variants. Finally, drawing on manually curated data, we developed a machine-learning algorithm that allows us to predict the TISs in other human genes.
越来越多的研究强调了非编码变异在遗传性疾病发展中的作用。然而,由于对其致病性机制了解不足,此类变异在临床遗传检测中的解释仍然是一个关键挑战。先前的研究表明,由于上游开放阅读框 (uORFs) 的破坏,5'非翻译区 (5'UTR) 中的变异可导致遗传性疾病。在这里,我们对 OMIM 数据库中与人类疾病相关基因中的上游翻译起始位点 (TIS) 进行了手动注释,并揭示了约 4700 个与 uORFs 相关的 TIS。我们将我们的 TIS 与以前的研究进行了比较,并提供了一份“高可信度”uORFs 的列表。使用荧光素酶测定法,我们实验验证了 ETFDH、PAX9、MAST1、HTT、TTN、GLI2 和 COL2A1 基因中的 uORFs 翻译,以及 ZIC2 基因中 N 端 CDS 延伸的存在。此外,我们创建了一个工具来注释位于 uORFs 中的遗传变异的影响。我们从 HGMD 和 ClinVar 数据库中揭示了破坏 uORFs 从而可能导致孟德尔疾病的变异。我们还表明,影响 uORFs 的变异在致病和群体变异之间的分布不同。最后,借鉴精心策划的数据,我们开发了一种机器学习算法,使我们能够预测其他人类基因中的 TIS。