Volpe Emilia, Corda Luca, Tommaso Elena Di, Pelliccia Franca, Ottalevi Riccardo, Licastro Danilo, Guarracino Andrea, Capulli Mattia, Formenti Giulio, Tassone Evelyne, Giunta Simona
Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome "Sapienza", Piazzale Aldo Moro 5, 00185 Rome, Italy.
Department of Bioinformatic, Dante Genomics Corp Inc., 667 Madison Avenue, New York, NY 10065 USA and S.s.17, 67100, L'Aquila, Italy.
bioRxiv. 2023 Dec 30:2023.11.01.565049. doi: 10.1101/2023.11.01.565049.
Comparative analysis of recent human genome assemblies highlights profound sequence divergence that peaks within polymorphic loci such as centromeres. This raises the question about the adequacy of relying on human reference genomes to accurately analyze sequencing data derived from experimental cell lines. Here, we generated the complete diploid genome assembly for the human retinal epithelial cells (RPE-1), a widely used non-cancer laboratory cell line with a stable karyotype, to use as matched reference for multi-omics sequencing data analysis. Our RPE1v1.0 assembly presents completely phased haplotypes and chromosome-level scaffolds that span centromeres with ultra-high base accuracy (>QV60). We mapped the haplotype-specific genomic variation specific to this cell line including t(X;10), a stable 73.18 Mb duplication of chromosome 10 translocated onto the microdeleted chromosome X telomere t(X;10). Polymorphisms between haplotypes of the same genome reveals genetic and epigenetic variation for all chromosomes, especially at centromeres. The RPE-1 assembly as matched reference genome improves mapping quality of multi-omics reads originating from RPE-1 cells with drastic reduction in alignments mismatches compared to using the most complete human reference to date (CHM13). Leveraging the accuracy achieved using a matched reference, we were able to identify the kinetochore sites at base pair resolution and show unprecedented variation between haplotypes. This work showcases the use of matched reference genomes for multiomics analyses and serves as the foundation for a call to comprehensively assemble experimentally relevant cell lines for widespread application.
对近期人类基因组组装的比较分析突出了深刻的序列差异,这种差异在着丝粒等多态性位点达到峰值。这就引发了一个问题,即依靠人类参考基因组来准确分析来自实验细胞系的测序数据是否足够。在这里,我们生成了人类视网膜上皮细胞(RPE-1)的完整二倍体基因组组装,RPE-1是一种广泛使用的非癌实验室细胞系,具有稳定的核型,用作多组学测序数据分析的匹配参考。我们的RPE1v1.0组装呈现了完全定相的单倍型和跨越着丝粒的染色体水平支架,具有超高的碱基准确性(>QV60)。我们绘制了该细胞系特有的单倍型特异性基因组变异图谱,包括t(X;10),这是染色体10上一个稳定的73.18 Mb重复片段,易位到微缺失的X染色体端粒t(X;10)上。同一基因组单倍型之间的多态性揭示了所有染色体的遗传和表观遗传变异,尤其是在着丝粒处。作为匹配参考基因组的RPE-1组装提高了源自RPE-1细胞的多组学读数的映射质量,与使用迄今为止最完整的人类参考基因组(CHM13)相比,比对错配大幅减少。利用匹配参考所实现的准确性,我们能够以碱基对分辨率识别动粒位点,并展示单倍型之间前所未有的变异。这项工作展示了匹配参考基因组在多组学分析中的应用,并为呼吁全面组装实验相关细胞系以广泛应用奠定了基础。