School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China.
State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen, 361102, China.
BMC Biol. 2024 Nov 26;22(1):273. doi: 10.1186/s12915-024-02069-8.
Accurate and comprehensive genomic annotation, including the full list of protein-coding genes, is vital for understanding the molecular mechanisms of human biology. We have previously shown that the genome contains a multitude of yet hidden functional exons and transcripts, some of which might represent novel mRNAs. These results resonate with those from other groups and strongly argue that two decades after the completion of the first draft of the human genome sequence, the current annotation of human genes and transcripts remains far from being complete.
Using a targeted RNA enrichment technique, we showed that one of the novel functional exons previously discovered by us and currently annotated as part of a long non-coding RNA, is actually a part of a novel protein-coding gene, InSETG-4, which encodes a novel human protein with no known homologs or motifs. We found that InSETG-4 is induced by various DNA-damaging agents across multiple cell types and therefore might represent a novel component of DNA damage response. Despite its low abundance in bulk cell populations, InSETG-4 exhibited expression restricted to a small fraction of cells, as demonstrated by the amplification-based single-molecule fluorescence in situ hybridization (asmFISH) analysis.
This study argues that yet undiscovered human protein-coding genes exist and provides an example of how targeted RNA enrichment techniques can help to fill this major gap in our knowledge of the information encoded in the human genome.
准确和全面的基因组注释,包括蛋白质编码基因的完整列表,对于理解人类生物学的分子机制至关重要。我们之前已经表明,基因组包含大量隐藏的功能外显子和转录本,其中一些可能代表新的 mRNA。这些结果与其他研究小组的结果一致,并强烈表明,在完成人类基因组序列初稿的 20 年后,目前对人类基因和转录本的注释仍然远远不够完整。
我们使用靶向 RNA 富集技术表明,我们之前发现的一个新的功能外显子,目前被注释为长非编码 RNA 的一部分,实际上是一个新的蛋白质编码基因 InSETG-4 的一部分,该基因编码一种新的人类蛋白质,没有已知的同源物或基序。我们发现 InSETG-4 被多种细胞类型的各种 DNA 损伤剂诱导,因此可能代表 DNA 损伤反应的一个新组成部分。尽管 InSETG-4 在大量细胞群体中的丰度较低,但通过基于扩增的单分子荧光原位杂交(asmFISH)分析表明,它在一小部分细胞中表现出受限的表达。
这项研究表明,尚未发现的人类蛋白质编码基因的存在,并提供了一个例子,说明如何使用靶向 RNA 富集技术来填补我们对人类基因组中编码信息的了解中的这一主要空白。