Cerdán-Vélez Daniel, Tress Michael L
Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO).
bioRxiv. 2023 Jun 17:2023.06.14.544951. doi: 10.1101/2023.06.14.544951.
The WASH1 gene produces a protein that forms part of the developmentally important WASH complex. The WASH complex activates the Arp2/3 complex to initiate branched actin networks at the surface of endosomes. As a curiosity, the human reference gene set includes nine WASH1 genes. How many of these are pseudogenes and how many are coding genes is not clear. Eight of the nine WASH1 genes reside in rearrangement and duplication-prone subtelomeric regions. Many of these subtelomeric regions had gaps in the GRCh38 human genome assembly, but the recently published T2T-CHM13 assembly from the Telomere to Telomere (T2T) Consortium has filled in the gaps. As a result, the T2T Consortium has added four new WASH1 paralogues in previously unannotated subtelomeric regions. Here we show that one of these four novel WASH1 genes, , is the gene most likely to produce the functional WASH1 protein. We also demonstrate that the other twelve WASH1 genes derived from a single pseudogene on chromosome 12. These 12 genes include WASHC1, the gene currently annotated as the functional WASH1 gene. We propose should be annotated as a coding gene and all functional information relating to the gene on chromosome 9 should be transferred to . The remaining WASH1 genes, including . should be annotated as pseudogenes. This work confirms that the T2T assembly has added at least one functionally relevant coding gene to the human reference set. It remains to be seen whether other important coding genes are missing from the GRCh38 reference assembly.
WASH1基因产生一种蛋白质,该蛋白质是发育过程中重要的WASH复合物的一部分。WASH复合物激活Arp2/3复合物,在内体表面启动分支肌动蛋白网络。有趣的是,人类参考基因集包含九个WASH1基因。其中有多少是假基因,有多少是编码基因尚不清楚。九个WASH1基因中的八个位于易发生重排和重复的亚端粒区域。这些亚端粒区域中的许多在GRCh38人类基因组组装中存在缺口,但端粒到端粒(T2T)联盟最近发布的T2T-CHM13组装填补了这些缺口。因此,T2T联盟在以前未注释的亚端粒区域添加了四个新的WASH1旁系同源基因。在这里,我们表明这四个新的WASH1基因之一, ,是最有可能产生功能性WASH1蛋白的基因。我们还证明,其他十二个WASH1基因源自12号染色体上的一个假基因。这12个基因包括WASHC1,即目前被注释为功能性WASH1基因的基因。我们建议 应被注释为编码基因,并且与9号染色体上的 基因相关的所有功能信息应转移到 。其余的WASH1基因,包括 ,应被注释为假基因。这项工作证实,T2T组装已为人类参考集添加了至少一个功能相关的编码基因。GRCh38参考组装中是否缺少其他重要的编码基因还有待观察。