Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
Oxford Nanopore Technologies Inc., Oxford, UK.
Nature. 2023 Sep;621(7978):344-354. doi: 10.1038/s41586-023-06457-y. Epub 2023 Aug 23.
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
人类 Y 染色体因其复杂的重复结构而难以测序和组装,其中包括长回文序列、串联重复和片段重复。因此,GRCh38 参考序列中缺失了超过一半的 Y 染色体,它仍然是最后一个完成的人类染色体。在这里,端粒到端粒(T2T)联盟展示了来自 HG002 基因组的人类 Y 染色体的完整 62,460,029 碱基对序列(T2T-Y),该序列纠正了 GRCh38-Y 中的多个错误,并向参考序列添加了超过 3000 万个碱基对,展示了 TSPY、DAZ 和 RBMY 基因家族的完整扩增子结构;41 个额外的蛋白质编码基因,主要来自 TSPY 家族;以及异染色质 Yq12 区域中人类卫星 1 和 3 块的交替模式。我们将 T2T-Y 与之前的 CHM13 基因组组装相结合,并将可用的群体变异、临床变异和功能基因组学数据映射到该序列中,从而为所有 24 个人类染色体生成了一个完整且全面的参考序列。