Moore Jill E, Pratt Henry E, Fan Kaili, Phalke Nishigandha, Fisher Jonathan, Elhajjajy Shaimae I, Andrews Gregory, Gao Mingshi, Shedd Nicole, Fu Yu, Lacadie Matthew C, Meza Jair, Ganna Mohit, Choudhury Eva, Swofford Ross, Farrell Nina P, Pampari Anusri, Ramalingam Vivekanandan, Reese Fairlie, Borsari Beatrice, Yu Michelle, Wattenberg Eve, Ruiz-Romero Marina, Razavi-Mohseni Milad, Xu Jinrui, Galeev Timur, Beer Michael A, Guigó Roderic, Gerstein Mark, Engreitz Jesse, Ljungman Mats, Reddy Timothy E, Snyder Michael P, Epstein Charles B, Gaskell Elizabeth, Bernstein Bradley E, Dickel Diane E, Visel Axel, Pennacchio Len A, Mortazavi Ali, Kundaje Anshul, Weng Zhiping
Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA.
Broad Institute of MIT and Harvard, Cambridge, MA, USA.
bioRxiv. 2024 Dec 26:2024.12.26.629296. doi: 10.1101/2024.12.26.629296.
Mammalian genomes contain millions of regulatory elements that control the complex patterns of gene expression. Previously, The ENCODE consortium mapped biochemical signals across many cell types and tissues and integrated these data to develop a Registry of 0.9 million human and 300 thousand mouse candidate cis-Regulatory Elements (cCREs) annotated with potential functions. We have expanded the Registry to include 2.35 million human and 927 thousand mouse cCREs, leveraging new ENCODE datasets and enhanced computational methods. This expanded Registry covers hundreds of unique cell and tissue types, providing a comprehensive understanding of gene regulation. Functional characterization data from assays like STARR-seq, MPRA, CRISPR perturbation, and transgenic mouse assays now cover over 90% of human cCREs, revealing complex regulatory functions. We identified thousands of novel silencer cCREs and demonstrated their dual enhancer/silencer roles in different cellular contexts. Integrating the Registry with other ENCODE annotations facilitates genetic variation interpretation and trait-associated gene identification, exemplified by discovering as a novel causal gene for red blood cell traits. This expanded Registry is a valuable resource for studying the regulatory genome and its impact on health and disease.
哺乳动物基因组包含数百万个调控元件,这些元件控制着基因表达的复杂模式。此前,ENCODE 联盟绘制了多种细胞类型和组织中的生化信号图谱,并整合这些数据,建立了一个包含 90 万个具有潜在功能注释的人类和 30 万只小鼠候选顺式调控元件(cCRE)的登记册。我们利用新的 ENCODE 数据集和改进的计算方法,将登记册扩展到包括 235 万个人类和 92.7 万个小鼠 cCRE。这个扩展后的登记册涵盖了数百种独特的细胞和组织类型,提供了对基因调控的全面理解。来自 STARR-seq、MPRA、CRISPR 干扰和转基因小鼠实验等检测的功能表征数据现在覆盖了超过 90%的人类 cCRE,揭示了复杂的调控功能。我们鉴定出数千个新型沉默子 cCRE,并证明了它们在不同细胞环境中的双重增强子/沉默子作用。将该登记册与其他 ENCODE 注释整合有助于解释遗传变异和识别与性状相关的基因,例如发现 是红细胞性状的一个新的因果基因。这个扩展后的登记册是研究调控基因组及其对健康和疾病影响的宝贵资源。