National Heart and Lung Institute, Imperial College London, W12 ONN London, UK; National Institute for Health Research (NIHR) Imperial Biomedical Research Centre, W2 1NY London, UK.
Topgen Biopharm Technology Co. Ltd., Shanghai 201203, China.
Am J Hum Genet. 2023 Nov 2;110(11):1903-1918. doi: 10.1016/j.ajhg.2023.09.005. Epub 2023 Oct 9.
Despite whole-genome sequencing (WGS), many cases of single-gene disorders remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection. Since early WGS data analytic steps prioritize protein-coding sequences, to simultaneously prioritize variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY, an analytic tool that integrates coordinates for regions with experimental evidence of functionality. Applied to WGS data from solved and unsolved hereditary hemorrhagic telangiectasia (HHT) recruits to the 100,000 Genomes Project, GROFFFY-based filtration reduced the mean number of variants/DNA from 4,867,167 to 21,486, without deleting disease-causal variants. In three unsolved cases (two related), GROFFFY identified ultra-rare deletions within the 3' untranslated region (UTR) of the tumor suppressor SMAD4, where germline loss-of-function alleles cause combined HHT and colonic polyposis (MIM: 175050). Sited >5.4 kb distal to coding DNA, the deletions did not modify or generate microRNA binding sites, but instead disrupted the sequence context of the final cleavage and polyadenylation site necessary for protein production: By iFoldRNA, an AAUAAA-adjacent 16-nucleotide deletion brought the cleavage site into inaccessible neighboring secondary structures, while a 4-nucleotide deletion unfolded the downstream RNA polymerase II roadblock. SMAD4 RNA expression differed to control-derived RNA from resting and cycloheximide-stressed peripheral blood mononuclear cells. Patterns predicted the mutational site for an unrelated HHT/polyposis-affected individual, where a complex insertion was subsequently identified. In conclusion, we describe a functional rare variant type that impacts regulatory systems based on RNA polyadenylation. Extension of coding sequence-focused gene panels is required to capture these variants.
尽管进行了全基因组测序(WGS),但仍有许多单基因疾病病例未得到解决,这阻碍了那些致病变异无法被检测到的患者的诊断和预防保健。由于早期的 WGS 数据分析步骤优先考虑蛋白质编码序列,因此我们开发了 GROFFFY,这是一种分析工具,可以同时优先考虑富含转录和关键调控序列的非编码区域中的变异。将 GROFFFY 应用于从已解决和未解决的遗传性出血性毛细血管扩张症(HHT)招募到 10 万基因组计划的 WGS 数据中,基于 GROFFFY 的筛选将平均变异数量/DNA 从 4,867,167 减少到 21,486,而不会删除致病变异。在三个未解决的病例(两个相关)中,GROFFFY 在肿瘤抑制因子 SMAD4 的 3'非翻译区(UTR)中鉴定了超罕见缺失,其中生殖系功能丧失等位基因导致 HHT 和结肠息肉病(MIM:175050)的联合发生。这些缺失位于编码 DNA 远端>5.4 kb 处,不会改变或产生 microRNA 结合位点,但会破坏最终切割和多聚腺苷酸化位点的序列上下文,这对于蛋白质产生是必要的:通过 iFoldRNA,一个紧邻 AAUAAA 的 16 个核苷酸缺失将切割位点带入无法接近的相邻二级结构,而 4 个核苷酸的缺失使下游 RNA 聚合酶 II 路障展开。SMAD4 RNA 表达与来自静止和环已酰亚胺应激外周血单核细胞的对照衍生 RNA 不同。模式预测了另一个与 HHT/息肉病相关的个体的突变位点,随后鉴定出了一个复杂的插入。总之,我们描述了一种基于 RNA 多聚腺苷酸化的影响调节系统的功能性罕见变异类型。需要扩展编码序列为重点的基因面板才能捕获这些变异。