Great Ormond Street Institute of Child Health, University College London, London, WC1E 1EH, UK.
BMC Med Genomics. 2024 Oct 4;17(1):244. doi: 10.1186/s12920-024-02017-z.
Batten disease is a group of rare inherited neurodegenerative diseases. Juvenile CLN3 disease is the most prevalent type, and the most common pathogenic variant shared by most patients is the "1-kb" deletion which removes two internal coding exons (7 and 8) in CLN3. Previously, we identified two transcripts in patient fibroblasts homozygous for the 1-kb deletion: the 'major' and 'minor' transcripts. To understand the full variety of disease transcripts and their role in disease pathogenesis, it is necessary to first investigate CLN3 transcription in "healthy" samples without juvenile CLN3 disease.
We leveraged PacBio long-read RNA sequencing datasets from ENCODE to investigate the full range of CLN3 transcripts across various tissues and cell types in human control samples. Then we sought to validate their existence using data from different sources.
We found that a readthrough gene affects the quantification and annotation of CLN3. After taking this into account, we detected over 100 novel CLN3 transcripts, with no dominantly expressed CLN3 transcript. The most abundant transcript has median usage of 42.9%. Surprisingly, the known disease-associated 'major' transcripts are detected. Together, they have median usage of 1.5% across 22 samples. Furthermore, we identified 48 CLN3 ORFs, of which 26 are novel. The predominant ORF that encodes the canonical CLN3 protein isoform has median usage of 66.7%, meaning around one-third of CLN3 transcripts encode protein isoforms with different stretches of amino acids. The same ORFs could be found with alternative UTRs. Moreover, we were able to validate the translational potential of certain transcripts using public mass spectrometry data.
Overall, these findings provide valuable insights into the complexity of CLN3 transcription, highlighting the importance of studying both canonical and non-canonical CLN3 protein isoforms as well as the regulatory role of UTRs to fully comprehend the regulation and function(s) of CLN3. This knowledge is essential for investigating the impact of the 1-kb deletion and rare pathogenic variants on CLN3 transcription and disease pathogenesis.
Batten 病是一组罕见的遗传性神经退行性疾病。青少年 CLN3 病是最常见的类型,大多数患者共有的最常见致病变异是“1-kb”缺失,该缺失去除了 CLN3 中的两个内部编码外显子(7 和 8)。此前,我们在纯合“1-kb”缺失的患者成纤维细胞中鉴定出两种转录本:“主要”和“次要”转录本。为了了解疾病转录本的全部种类及其在疾病发病机制中的作用,首先需要在没有青少年 CLN3 病的“健康”样本中研究 CLN3 的转录。
我们利用 ENCODE 的 PacBio 长读 RNA 测序数据集来研究人类对照样本中各种组织和细胞类型中 CLN3 转录本的全谱。然后,我们试图使用来自不同来源的数据来验证它们的存在。
我们发现通读基因会影响 CLN3 的定量和注释。在考虑到这一点后,我们检测到了 100 多种新的 CLN3 转录本,没有明显表达的 CLN3 转录本。最丰富的转录本的使用中位数为 42.9%。令人惊讶的是,检测到了已知的与疾病相关的“主要”转录本。它们在 22 个样本中的使用中位数为 1.5%。此外,我们鉴定出了 48 个 CLN3 开放阅读框,其中 26 个是新的。编码经典 CLN3 蛋白异构体的主要开放阅读框的使用中位数为 66.7%,这意味着大约三分之一的 CLN3 转录本编码具有不同氨基酸长度的蛋白异构体。相同的开放阅读框可以与不同的 UTR 一起找到。此外,我们能够使用公共质谱数据验证某些转录本的翻译潜力。
总的来说,这些发现为 CLN3 转录的复杂性提供了有价值的见解,强调了研究经典和非经典 CLN3 蛋白异构体以及 UTR 的调节作用的重要性,以充分理解 CLN3 的调控和功能。这些知识对于研究 1-kb 缺失和罕见致病变异对 CLN3 转录和发病机制的影响至关重要。