Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA.
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
自 2000 年首次发布以来,人类参考基因组仅涵盖了基因组的常染色质部分,而重要的异染色质区域仍未完成。端粒到端粒(T2T)联盟解决了基因组剩余的 8%,展示了一个完整的 30.55 亿碱基对的人类基因组序列,T2T-CHM13,它包括除 Y 染色体以外所有染色体的无间隙组装,纠正了先前参考序列中的错误,并引入了近 2 亿碱基对的序列,其中包含 1956 个基因预测,其中 99 个被预测为蛋白质编码。已完成的区域包括所有着丝粒卫星阵列、最近的片段重复以及所有五个近端着丝粒染色体的短臂,为这些基因组的复杂区域的变异和功能研究提供了可能。