Suppr超能文献

基于近似 k-最近邻图的稳健高效单细胞 Hi-C 聚类。

Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs.

机构信息

Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany.

Signalling Research Centre CIBSS, University of Freiburg, 79104 Freiburg, Germany.

出版信息

Bioinformatics. 2021 Nov 18;37(22):4006-4013. doi: 10.1093/bioinformatics/btab394.

Abstract

MOTIVATION

Hi-C technology provides insights into the 3D organization of the chromatin, and the single-cell Hi-C method enables researchers to gain knowledge about the chromatin state in individual cell levels. Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster thousands of single-cell Hi-C interaction matrices, they are flattened and compiled into one matrix. Depending on the resolution, this matrix can have a few million or even billions of features; therefore, computations can be memory intensive. We present a single-cell Hi-C clustering approach using an approximate nearest neighbors method based on locality-sensitive hashing to reduce the dimensions and the computational resources.

RESULTS

The presented method can process a 10 kb single-cell Hi-C dataset with 2600 cells and needs 40 GB of memory, while competitive approaches are not computable even with 1 TB of memory. It can be shown that the differentiation of the cells by their chromatin folding properties and, therefore, the quality of the clustering of single-cell Hi-C data is advantageous compared to competitive algorithms.

AVAILABILITY AND IMPLEMENTATION

The presented clustering algorithm is part of the scHiCExplorer, is available on Github https://github.com/joachimwolff/scHiCExplorer, and as a conda package via the bioconda channel. The approximate nearest neighbors implementation is available via https://github.com/joachimwolff/sparse-neighbors-search and as a conda package via the bioconda channel.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

Hi-C 技术提供了对染色质 3D 结构的深入了解,而单细胞 Hi-C 方法使研究人员能够了解单个细胞水平的染色质状态。单细胞 Hi-C 相互作用矩阵是高维且非常稀疏的。为了对数千个单细胞 Hi-C 相互作用矩阵进行聚类,将它们展平并编译到一个矩阵中。根据分辨率的不同,这个矩阵可能有几百万甚至几十亿个特征,因此计算可能会占用大量内存。我们提出了一种使用基于局部敏感哈希的近似最近邻方法的单细胞 Hi-C 聚类方法,以降低维度和计算资源。

结果

所提出的方法可以处理具有 2600 个细胞的 10kb 单细胞 Hi-C 数据集,需要 40GB 的内存,而竞争方法即使使用 1TB 的内存也无法计算。可以表明,通过其染色质折叠特性对细胞进行区分,因此与竞争算法相比,单细胞 Hi-C 数据的聚类质量具有优势。

可用性和实现

所提出的聚类算法是 scHiCExplorer 的一部分,可在 Github 上获得 https://github.com/joachimwolff/scHiCExplorer,并可通过 bioconda 频道作为 conda 包获得。近似最近邻实现可通过 https://github.com/joachimwolff/sparse-neighbors-search 获得,并可通过 bioconda 频道作为 conda 包获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/714c/9502147/c5a82de1e227/btab394f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验