INSPER Institute of Education and Research, Sao Paulo, Brazil.
Centre for Genomic Regulation (CRG), Barcelona, Spain.
Nature. 2023 Oct;622(7981):41-47. doi: 10.1038/s41586-023-06490-x. Epub 2023 Oct 4.
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.
自 2001 年首次公布人类基因组初稿以来,科学家们一直试图确定人类基因组中的每一个基因。自那时以来,在鉴定蛋白质编码基因方面已经取得了很大进展,目前估计蛋白质编码基因的数量不到 20000 个,具有越来越多独特的蛋白质编码亚型。在这里,我们回顾了人类基因目录的现状以及近年来完成它的努力。除了正在进行的蛋白质编码基因、其亚型和假基因的注释外,高通量 RNA 测序和其他技术突破的发明导致报告的非编码 RNA 基因数量迅速增加。对于这些非编码 RNA 中的大多数,其功能相关性目前尚不清楚;我们着眼于最近的进展,这些进展为确定它们的功能并最终完成人类基因目录提供了途径。最后,我们研究了是否需要一个通用的注释标准,该标准包含所有具有医学意义的基因,并维护它们与不同参考基因组的关系,以便在临床环境中使用人类基因目录。