Amaral Paulo, Carbonell-Sala Silvia, De La Vega Francisco M, Faial Tiago, Frankish Adam, Gingeras Thomas, Guigo Roderic, Harrow Jennifer L, Hatzigeorgiou Artemis G, Johnson Rory, Murphy Terence D, Pertea Mihaela, Pruitt Kim D, Pujar Shashikant, Takahashi Hazuki, Ulitsky Igor, Varabyou Ales, Wells Christine A, Yandell Mark, Carninci Piero, Salzberg Steven L
INSPER Institute of Education and Research, São Paulo, SP, Brasil.
Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain.
ArXiv. 2023 Mar 24:arXiv:2303.13996v1.
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function. A combination of recent advances offers a path forward to identifying these functions and towards eventually completing the human gene catalogue. However, much work remains to be done before we have a universal annotation standard that includes all medically significant genes, maintains their relationships with different reference genomes, and describes clinically relevant genetic variants.
自人类基因组草图于2001年发布以来,科学家们一直在努力识别人类基因组中的所有基因。在这期间,在识别蛋白质编码基因方面取得了很大进展,估计数量已缩减至不到20,000个,尽管不同蛋白质编码异构体的数量已大幅增加。高通量RNA测序技术的发明和其他技术突破导致报告的非编码RNA基因数量激增,尽管其中大多数尚未发现任何已知功能。最近的一系列进展为识别这些功能以及最终完成人类基因目录提供了一条前进的道路。然而,在我们拥有一个通用注释标准之前,仍有许多工作要做,该标准应包括所有具有医学意义的基因,保持它们与不同参考基因组的关系,并描述临床相关的基因变异。