Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.
Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA.
Genome Res. 2019 Mar;29(3):472-484. doi: 10.1101/gr.234948.118. Epub 2019 Feb 8.
K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.
K562 广泛应用于生物医学研究。它是 ENCODE 的三个一级细胞系之一,也是最常用于大规模 CRISPR/Cas9 筛选的细胞系。尽管其功能基因组学和表观基因组学特征已被广泛研究,但它的基因组序列和基因组结构特征从未被全面分析。这些信息对于正确解释和理解 K562 现有的大量功能基因组学和表观基因组学数据至关重要。我们进行了深度覆盖全基因组(短插入)、mate-pair 和链接读取测序,以及核型分析和 array CGH 分析,以确定 K562 中的广泛基因组特征:高分辨率的非整倍体染色体片段的拷贝数(CN)、SNVs 和 indels(在非整倍体区域的 CN 中校正)、杂合性丢失、通常跨越整个染色体臂的兆碱基级别的相单倍型、结构变异(SVs),包括小和大规模复杂 SVs 和非参考逆转录转座子插入。许多 SVs 被定相、组装并经过实验验证。我们在肿瘤抑制基因中鉴定了多个等位基因特异性缺失和重复。考虑到非整倍性,我们重新分析了 K562 RNA-seq 和全基因组亚硫酸氢盐测序数据,以获得等位基因特异性表达和等位基因特异性 DNA 甲基化。我们还展示了如何通过将基因组变异信息和结构上下文与功能基因组学和表观基因组学数据相结合,获得对调控复杂性的更深入了解的示例。此外,我们还利用 K562 单倍型信息生成了等位基因特异性的 CRISPR 靶向图谱。这项全面的全基因组分析不仅为未来使用 K562 的研究提供了资源,也为分析其他癌症基因组提供了框架。