Laboratory of Epigenomics and Chromatin Organization, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore.
Cardiovascular Research Institute, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117599, Singapore.
BMC Genomics. 2021 Nov 3;22(1):789. doi: 10.1186/s12864-021-08085-0.
Transposable elements (TE) comprise nearly half of the human genome and their insertions have profound effects to human genetic diversification and as well as disease. Despite their abovementioned significance, there is no consensus on the TE subfamilies that remain active in the human genome. In this study, we therefore developed a novel statistical test for recently mobile subfamilies (RMSs), based on patterns of overlap with > 100,000 polymorphic indels.
Our analysis produced a catalogue of 20 high-confidence RMSs, which excludes many false positives in public databases. Intriguingly though, it includes HERV-K, an LTR subfamily previously thought to be extinct. The RMS catalogue is strongly enriched for contributions to germline genetic disorders (P = 1.1e-10), and thus constitutes a valuable resource for diagnosing disorders of unknown aetiology using targeted TE-insertion screens. Remarkably, RMSs are also highly enriched for somatic insertions in diverse cancers (P = 2.8e-17), thus indicating strong correlations between germline and somatic TE mobility. Using CRISPR/Cas9 deletion, we show that an RMS-derived polymorphic TE insertion increased the expression of RPL17, a gene associated with lower survival in liver cancer. More broadly, polymorphic TE insertions from RMSs were enriched near genes with allele-specific expression, suggesting widespread effects on gene regulation.
By using a novel statistical test we have defined a catalogue of 20 recently mobile transposable element subfamilies. We illustrate the gene regulatory potential of RMS-derived polymorphic TE insertions, using CRISPR/Cas9 deletion in vitro on a specific candidate, as well as by genome wide analysis of allele-specific expression. Our study presents novel insights into TE mobility and regulatory potential and provides a key resource for human disease genetics and population history studies.
转座元件 (TE) 约占人类基因组的一半,它们的插入对人类遗传多样性和疾病有深远的影响。尽管它们具有重要意义,但对于人类基因组中仍处于活跃状态的 TE 亚家族,目前还没有达成共识。因此,在这项研究中,我们开发了一种新的统计测试方法,用于检测最近活跃的转座子亚家族 (RMS),该方法基于与 >100,000 个多态性插入缺失重叠的模式。
我们的分析产生了 20 个高可信度 RMS 的目录,其中排除了公共数据库中许多假阳性的结果。然而,有趣的是,它包括 HERV-K,这是一个以前被认为已经灭绝的 LTR 亚家族。RMS 目录强烈富集了与生殖系遗传疾病相关的贡献 (P=1.1e-10),因此对于使用靶向 TE 插入筛选来诊断病因不明的疾病是一个有价值的资源。值得注意的是,RMS 也在多种癌症的体细胞插入中高度富集 (P=2.8e-17),这表明生殖系和体细胞 TE 移动之间存在强烈的相关性。使用 CRISPR/Cas9 缺失,我们证明了一个 RMS 衍生的多态性 TE 插入增加了 RPL17 的表达,RPL17 与肝癌患者的生存率降低有关。更广泛地说,RMS 衍生的多态性 TE 插入在等位基因特异性表达的基因附近富集,表明对基因调控有广泛的影响。
通过使用一种新的统计测试,我们定义了一个最近活跃的转座元件亚家族目录。我们通过在体外使用 CRISPR/Cas9 缺失对特定候选基因进行研究,以及通过全基因组等位基因特异性表达分析,说明了 RMS 衍生的多态性 TE 插入的基因调控潜力。我们的研究为 TE 移动和调控潜力提供了新的见解,并为人类疾病遗传学和群体历史研究提供了一个关键资源。