Center for Precision Genome Editing and Genetic Technologies for Biomedicine.
Group of Genome Spatial Organization, Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia.
Bioinformatics. 2020 Nov 1;36(17):4560-4567. doi: 10.1093/bioinformatics/btaa555.
The application of genome-wide chromosome conformation capture (3C) methods to prokaryotes provided insights into the spatial organization of their genomes and identified patterns conserved across the tree of life, such as chromatin compartments and contact domains. Prokaryotic genomes vary in GC content and the density of restriction sites along the chromosome, suggesting that these properties should be considered when planning experiments and choosing appropriate software for data processing. Diverse algorithms are available for the analysis of eukaryotic chromatin contact maps, but their potential application to prokaryotic data has not yet been evaluated.
Here, we present a comparative analysis of domain calling algorithms using available single-microbe experimental data. We evaluated the algorithms' intra-dataset reproducibility, concordance with other tools and sensitivity to coverage and resolution of contact maps. Using RNA-seq as an example, we showed how orthogonal biological data can be utilized to validate the reliability and significance of annotated domains. We also suggest that in silico simulations of contact maps can be used to choose optimal restriction enzymes and estimate theoretical map resolutions before the experiment. Our results provide guidelines for researchers investigating microbes and microbial communities using high-throughput 3C assays such as Hi-C and 3C-seq.
The code of the analysis is available at https://github.com/magnitov/prokaryotic_cids.
Supplementary data are available at Bioinformatics online.
将全基因组染色体构象捕获(3C)方法应用于原核生物,深入了解了它们基因组的空间组织,并确定了跨生命之树保守的模式,如染色质隔室和接触域。原核基因组的 GC 含量和染色体上限制酶位点的密度存在差异,这表明在规划实验和选择适用于数据处理的软件时,应考虑这些特性。有多种算法可用于分析真核染色质接触图谱,但它们在原核数据中的潜在应用尚未得到评估。
在这里,我们使用现有的单微生物实验数据对结构域调用算法进行了比较分析。我们评估了算法在数据集内的重现性、与其他工具的一致性以及对接触图谱覆盖率和分辨率的敏感性。我们使用 RNA-seq 作为示例,展示了如何利用正交生物数据来验证注释结构域的可靠性和显著性。我们还建议,可以使用接触图谱的计算机模拟来选择最佳的限制性内切酶,并在实验前估计理论图谱分辨率。我们的结果为使用高通量 3C 测定法(如 Hi-C 和 3C-seq)研究微生物和微生物群落的研究人员提供了指导。
分析的代码可在 https://github.com/magnitov/prokaryotic_cids 上获得。
补充数据可在《生物信息学》在线获得。