Jaganathan Kishore, Ersaro Nicole, Novakovsky Gherman, Wang Yuchuan, James Terena, Schwartzentruber Jeremy, Fiziev Petko, Kassam Irfahan, Cao Fan, Hawe Johann, Cavanagh Henry, Lim Ashley, Png Grace, McRae Jeremy, Banerjee Abhimanyu, Kumar Arvind, Ulirsch Jacob, Zhang Yan, Aguet Francois, Wainschtein Pierrick, Sundaram Laksshman, Salcedo Adriana, Panagiotopoulou Sofia Kyriazopoulou, Aghamirzaie Delasa, Padhi Evin, Weng Ziming, Dong Shan, Smedley Damian, Caulfield Mark, O'Donnell-Luria Anne, Rehm Heidi L, Sanders Stephan J, Kundaje Anshul, Montgomery Stephen B, Ross Mark T, Farh Kyle Kai-How
Illumina Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA, USA.
Department of Pathology, Stanford University, Stanford, CA, USA.
Science. 2025 Aug 7;389(6760):eads7373. doi: 10.1126/science.ads7373.
Only a minority of patients with rare genetic diseases are presently diagnosed by exome sequencing, suggesting that additional unrecognized pathogenic variants may reside in noncoding sequence. In this work, we describe PromoterAI, a deep neural network that accurately identifies noncoding promoter variants that dysregulate gene expression. We show that promoter variants with predicted expression-altering consequences produce outlier expression at both the RNA and protein levels in thousands of individuals and that these variants experience strong negative selection in human populations. We observed that clinically relevant genes in patients with rare diseases are enriched for such variants and validated their functional impact through reporter assays. Our estimates suggest that promoter variation accounts for 6% of the genetic burden associated with rare diseases.
目前,只有少数患有罕见遗传病的患者是通过外显子组测序确诊的,这表明可能还有其他未被识别的致病变异存在于非编码序列中。在这项研究中,我们描述了PromoterAI,这是一种深度神经网络,能够准确识别那些会导致基因表达失调的非编码启动子变体。我们发现,预测会产生表达改变后果的启动子变体,在数千人的RNA和蛋白质水平上均呈现出异常表达,并且这些变体在人类群体中受到强烈的负选择。我们观察到,罕见病患者的临床相关基因中富含此类变体,并通过报告基因检测验证了它们的功能影响。我们的估计表明,启动子变异占与罕见病相关的遗传负担的6%。