Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences.
Faculty of Medicine, Institute of Clinical Sciences, Imperial College London, Hammersmith Campus, London, UK.
Bioinformatics. 2019 Jul 15;35(14):2354-2361. doi: 10.1093/bioinformatics/bty1014.
Clusters of extremely conserved non-coding elements (CNEs) mark genomic regions devoted to cis-regulation of key developmental genes in Metazoa. We have recently shown that their span coincides with that of topologically associating domains (TADs), making them useful for estimating conserved TAD boundaries in the absence of Hi-C data. The standard approach-detecting CNEs in genome alignments and then establishing the boundaries of their clusters-requires tuning of several parameters and breaks down when comparing closely related genomes.
We present a novel, kurtosis-based measure of pairwise non-coding conservation that requires no pre-set thresholds for conservation level and length of CNEs. We show that it performs robustly across a large span of evolutionary distances, including across the closely related genomes of primates for which standard approaches fail. The method is straightforward to implement and enables detection and comparison of clusters of CNEs and estimation of underlying TADs across a vastly increased range of Metazoan genomes.
The data generated for this study, and the scripts used to generate the data, can be found at https://github.com/alexander-nash/kurtosis_conservation.
Supplementary data are available at Bioinformatics online.
极度保守的非编码元件(CNEs)簇标记了用于顺式调控变形动物关键发育基因的基因组区域。我们最近表明,它们的跨度与拓扑关联域(TAD)的跨度一致,这使得它们在没有 Hi-C 数据的情况下,可用于估计保守的 TAD 边界。标准方法——在基因组比对中检测 CNE,然后确定其簇的边界——需要调整几个参数,并且在比较密切相关的基因组时会失败。
我们提出了一种新的基于峰度的成对非编码保守性度量方法,它不需要为 CNE 的保守水平和长度预设阈值。我们表明,它在很大的进化距离范围内表现稳健,包括在标准方法失败的灵长类动物的密切相关基因组中。该方法易于实现,可用于检测和比较 CNE 簇,并在极大增加的变形动物基因组范围内估计潜在的 TAD。
本研究生成的数据以及用于生成数据的脚本可在 https://github.com/alexander-nash/kurtosis_conservation 上找到。
补充数据可在 Bioinformatics 在线获得。