Suppr超能文献

scHiClassifier:一种通过融合来自单细胞Hi-C数据的多个特征集进行细胞类型预测的深度学习框架。

scHiClassifier: a deep learning framework for cell type prediction by fusing multiple feature sets from single-cell Hi-C data.

作者信息

Zhou Xiangfei, Wu Hao

机构信息

School of Software, Shandong University, No. 1500, Shunhua Road, Hi-Tech Industrial Development Zone, Jinan 250100, Shandong, China.

Shenzhen Research Institute of Shandong University, Shandong University, No. 19, Gaoxin South 4th Road, Nanshan District, Shenzhen 518063, Guangdong, China.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf009.

Abstract

Single-cell high-throughput chromosome conformation capture (Hi-C) technology enables capturing chromosomal spatial structure information at the cellular level. However, to effectively investigate changes in chromosomal structure across different cell types, there is a requisite for methods that can identify cell types utilizing single-cell Hi-C data. Current frameworks for cell type prediction based on single-cell Hi-C data are limited, often struggling with features interpretability and biological significance, and lacking convincing and robust classification performance validation. In this study, we propose four new feature sets based on the contact matrix with clear interpretability and biological significance. Furthermore, we develop a novel deep learning framework named scHiClassifier based on multi-head self-attention encoder, 1D convolution and feature fusion, which integrates information from these four feature sets to predict cell types accurately. Through comprehensive comparison experiments with benchmark frameworks on six datasets, we demonstrate the superior classification performance and the universality of the scHiClassifier framework. We further assess the robustness of scHiClassifier through data perturbation experiments and data dropout experiments. Moreover, we demonstrate that using all feature sets in the scHiClassifier framework yields optimal performance, supported by comparisons of different feature set combinations. The effectiveness and the superiority of the multiple feature set extraction are proven by comparison with four unsupervised dimensionality reduction methods. Additionally, we analyze the importance of different feature sets and chromosomes using the "SHapley Additive exPlanations" method. Furthermore, the accuracy and reliability of the scHiClassifier framework in cell classification for single-cell Hi-C data are supported through enrichment analysis. The source code of scHiClassifier is freely available at https://github.com/HaoWuLab-Bioinformatics/scHiClassifier.

摘要

单细胞高通量染色体构象捕获(Hi-C)技术能够在细胞水平上捕获染色体空间结构信息。然而,为了有效研究不同细胞类型之间染色体结构的变化,需要能够利用单细胞Hi-C数据识别细胞类型的方法。当前基于单细胞Hi-C数据的细胞类型预测框架存在局限性,常常在特征可解释性和生物学意义方面面临困难,并且缺乏令人信服且稳健的分类性能验证。在本研究中,我们基于具有清晰可解释性和生物学意义的接触矩阵提出了四个新的特征集。此外,我们开发了一种名为scHiClassifier的新型深度学习框架,该框架基于多头自注意力编码器、一维卷积和特征融合,整合这四个特征集的信息以准确预测细胞类型。通过在六个数据集上与基准框架进行全面比较实验,我们证明了scHiClassifier框架具有卓越的分类性能和通用性。我们通过数据扰动实验和数据缺失实验进一步评估了scHiClassifier的稳健性。此外,通过不同特征集组合的比较,我们证明在scHiClassifier框架中使用所有特征集可产生最佳性能。与四种无监督降维方法的比较证明了多特征集提取的有效性和优越性。此外,我们使用“SHapley值加法解释”方法分析了不同特征集和染色体的重要性。此外,通过富集分析支持了scHiClassifier框架在单细胞Hi-C数据细胞分类中的准确性和可靠性。scHiClassifier的源代码可在https://github.com/HaoWuLab-Bioinformatics/scHiClassifier上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ecb/11744636/ec516a5c5397/bbaf009f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验