Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
Skolkovo Institute of Science and Technology, Skolkovo, Russia, 143026.
Sci Rep. 2020 Jul 9;10(1):11398. doi: 10.1038/s41598-020-68182-0.
Chromatin communities stabilized by protein machinery play essential role in gene regulation and refine global polymeric folding of the chromatin fiber. However, treatment of these communities in the framework of the classical network theory (stochastic block model, SBM) does not take into account intrinsic linear connectivity of the chromatin loci. Here we propose the polymer block model, paving the way for community detection in polymer networks. On the basis of this new model we modify the non-backtracking flow operator and suggest the first protocol for annotation of compartmental domains in sparse single cell Hi-C matrices. In particular, we prove that our approach corresponds to the maximum entropy principle. The benchmark analyses demonstrates that the spectrum of the polymer non-backtracking operator resolves the true compartmental structure up to the theoretical detectability threshold, while all commonly used operators fail above it. We test various operators on real data and conclude that the sizes of the non-backtracking single cell domains are most close to the sizes of compartments from the population data. Moreover, the found domains clearly segregate in the gene density and correlate with the population compartmental mask, corroborating biological significance of our annotation of the chromatin compartmental domains in single cells Hi-C matrices.
由蛋白质机器稳定的染色质社区在基因调控中发挥着重要作用,并细化了染色质纤维的全局聚合折叠。然而,在经典网络理论(随机块模型,SBM)的框架中处理这些社区时,并未考虑染色质基因座的内在线性连接。在这里,我们提出了聚合物块模型,为聚合物网络中的社区检测铺平了道路。在此基础上,我们修改了非回溯流运算符,并提出了稀疏单细胞 Hi-C 矩阵中隔室结构域注释的第一个方案。特别是,我们证明了我们的方法对应于最大熵原理。基准分析表明,聚合物非回溯运算符的谱可以解析真实的隔室结构,直到理论可检测阈值,而所有常用的运算符在此之上都会失败。我们在真实数据上测试了各种运算符,并得出结论,非回溯单个细胞域的大小最接近来自群体数据的隔室大小。此外,发现的域在基因密度中明显分离,并与群体隔室掩模相关,证实了我们对单细胞 Hi-C 矩阵中染色质隔室结构域的注释的生物学意义。