• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于稀疏 scChIP-seq 数据插补的单细胞特异性和可解释的机器学习模型。

Single-cell specific and interpretable machine learning models for sparse scChIP-seq data imputation.

机构信息

Institute of Organismic and Molecular Evolution (iOME), Faculty of Biology, Johannes Gutenberg University Mainz, Mainz, Germany.

Institute of Molecular Biology, Mainz, Germany.

出版信息

PLoS One. 2022 Jul 1;17(7):e0270043. doi: 10.1371/journal.pone.0270043. eCollection 2022.

DOI:10.1371/journal.pone.0270043
PMID:35776722
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9249201/
Abstract

MOTIVATION

Single-cell Chromatin ImmunoPrecipitation DNA-Sequencing (scChIP-seq) analysis is challenging due to data sparsity. High degree of sparsity in biological high-throughput single-cell data is generally handled with imputation methods that complete the data, but specific methods for scChIP-seq are lacking. We present SIMPA, a scChIP-seq data imputation method leveraging predictive information within bulk data from the ENCODE project to impute missing protein-DNA interacting regions of target histone marks or transcription factors.

RESULTS

Imputations using machine learning models trained for each single cell, each ChIP protein target, and each genomic region accurately preserve cell type clustering and improve pathway-related gene identification on real human data. Results on bulk data simulating single cells show that the imputations are single-cell specific as the imputed profiles are closer to the simulated cell than to other cells related to the same ChIP protein target and the same cell type. Simulations also show that 100 input genomic regions are already enough to train single-cell specific models for the imputation of thousands of undetected regions. Furthermore, SIMPA enables the interpretation of machine learning models by revealing interaction sites of a given single cell that are most important for the imputation model trained for a specific genomic region. The corresponding feature importance values derived from promoter-interaction profiles of H3K4me3, an activating histone mark, highly correlate with co-expression of genes that are present within the cell-type specific pathways in 2 real human and mouse datasets. The SIMPA's interpretable imputation method allows users to gain a deep understanding of individual cells and, consequently, of sparse scChIP-seq datasets.

AVAILABILITY AND IMPLEMENTATION

Our interpretable imputation algorithm was implemented in Python and is available at https://github.com/salbrec/SIMPA.

摘要

动机

单细胞染色质免疫沉淀 DNA 测序(scChIP-seq)分析由于数据稀疏而具有挑战性。生物高通量单细胞数据的高度稀疏性通常采用填补数据的插补方法来处理,但缺乏 scChIP-seq 的特定方法。我们提出了 SIMPA,这是一种 scChIP-seq 数据插补方法,利用 ENCODE 项目中批量数据中的预测信息来插补目标组蛋白标记或转录因子的缺失蛋白-DNA 相互作用区域。

结果

使用针对每个单细胞、每个 ChIP 蛋白靶标和每个基因组区域训练的机器学习模型进行插补,可以准确地保留细胞类型聚类,并在真实人类数据上提高与途径相关的基因识别。在模拟单细胞的批量数据上的结果表明,插补是单细胞特异性的,因为插补的图谱与模拟细胞比与同一 ChIP 蛋白靶标和同一细胞类型相关的其他细胞更接近。模拟还表明,100 个输入基因组区域已经足以训练用于插补数千个未检测到的区域的单细胞特异性模型。此外,SIMPA 通过揭示给定单细胞对特定基因组区域训练的插补模型最重要的相互作用位点,使机器学习模型的解释成为可能。从激活组蛋白标记 H3K4me3 的启动子相互作用谱中得出的相应特征重要值与 2 个真实人类和小鼠数据集中原位存在的细胞类型特异性途径中的基因的共表达高度相关。SIMPA 的可解释插补方法允许用户深入了解单个细胞,从而深入了解稀疏 scChIP-seq 数据集。

可用性和实现

我们的可解释插补算法是用 Python 实现的,可在 https://github.com/salbrec/SIMPA 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/d493cff38a63/pone.0270043.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/f106f58d907a/pone.0270043.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/4faa8e23e678/pone.0270043.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/4d81cc8db1c9/pone.0270043.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/6851dba0662d/pone.0270043.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/25d1bdfe7655/pone.0270043.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/b2c194d927b2/pone.0270043.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/d493cff38a63/pone.0270043.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/f106f58d907a/pone.0270043.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/4faa8e23e678/pone.0270043.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/4d81cc8db1c9/pone.0270043.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/6851dba0662d/pone.0270043.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/25d1bdfe7655/pone.0270043.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/b2c194d927b2/pone.0270043.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/370b/9249201/d493cff38a63/pone.0270043.g007.jpg

相似文献

1
Single-cell specific and interpretable machine learning models for sparse scChIP-seq data imputation.用于稀疏 scChIP-seq 数据插补的单细胞特异性和可解释的机器学习模型。
PLoS One. 2022 Jul 1;17(7):e0270043. doi: 10.1371/journal.pone.0270043. eCollection 2022.
2
GE-Impute: graph embedding-based imputation for single-cell RNA-seq data.GE-Impute:基于图嵌入的单细胞 RNA-seq 数据插补。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac313.
3
CL-Impute: A contrastive learning-based imputation for dropout single-cell RNA-seq data.CL-Impute:基于对比学习的 dropout 单细胞 RNA-seq 数据插补方法。
Comput Biol Med. 2023 Sep;164:107263. doi: 10.1016/j.compbiomed.2023.107263. Epub 2023 Jul 23.
4
Accurate and interpretable gene expression imputation on scRNA-seq data using IGSimpute.使用 IGSimpute 实现 scRNA-seq 数据的准确和可解释的基因表达推断。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad124.
5
DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data.DeepImpute:一种准确、快速且可扩展的深度学习神经网络方法,用于填补单细胞 RNA-seq 数据。
Genome Biol. 2019 Oct 18;20(1):211. doi: 10.1186/s13059-019-1837-6.
6
scATAC-seq preprocessing and imputation evaluation system for visualization, clustering and digital footprinting.scATAC-seq 预处理和插补评估系统,用于可视化、聚类和数字足迹分析。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad447.
7
ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.ClusterTAD:一种从Hi-C数据中检测染色体拓扑相关结构域的无监督机器学习方法。
BMC Bioinformatics. 2017 Nov 14;18(1):480. doi: 10.1186/s12859-017-1931-2.
8
scGCL: an imputation method for scRNA-seq data based on graph contrastive learning.scGCL:一种基于图对比学习的 scRNA-seq 数据插补方法。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad098.
9
Collaborative Completion of Transcription Factor Binding Profiles via Local Sensitive Unified Embedding.通过局部敏感统一嵌入实现转录因子结合谱的协同完成
IEEE Trans Nanobioscience. 2016 Dec;15(8):946-958. doi: 10.1109/TNB.2016.2625823. Epub 2016 Nov 7.
10
Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.使用视觉标签和监督式机器学习优化染色质免疫沉淀测序(ChIP-seq)峰检测工具
Bioinformatics. 2017 Feb 15;33(4):491-499. doi: 10.1093/bioinformatics/btw672.

引用本文的文献

1
Screening for non-invasive rsRNA biomarkers to assess embryo quality utilizing ultra-sensitive pandora sequencing combined with machine learning.利用超灵敏潘多拉测序结合机器学习筛选非侵入性rsRNA生物标志物以评估胚胎质量。
J Assist Reprod Genet. 2025 Sep 4. doi: 10.1007/s10815-025-03641-z.

本文引用的文献

1
Network diffusion for scalable embedding of massive single-cell ATAC-seq data.用于大规模单细胞ATAC-seq数据可扩展嵌入的网络扩散
Sci Bull (Beijing). 2021 Nov 30;66(22):2271-2276. doi: 10.1016/j.scib.2021.05.014. Epub 2021 May 15.
2
Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen.使用 scOpen 从单细胞 ATAC-seq 数据估计染色质可及性。
Nat Commun. 2021 Nov 4;12(1):6386. doi: 10.1038/s41467-021-26530-2.
3
Joint profiling of histone modifications and transcriptome in single cells from mouse brain.
单细胞中小鼠脑内组蛋白修饰与转录组的联合分析。
Nat Methods. 2021 Mar;18(3):283-292. doi: 10.1038/s41592-021-01060-3. Epub 2021 Feb 15.
4
FITs: forest of imputation trees for recovering true signals in single-cell open chromatin profiles.FITs:用于在单细胞开放染色质图谱中恢复真实信号的插补树森林
NAR Genom Bioinform. 2020 Nov 19;2(4):lqaa091. doi: 10.1093/nargab/lqaa091. eCollection 2020 Dec.
5
A pitfall for machine learning methods aiming to predict across cell types.旨在跨细胞类型进行预测的机器学习方法的一个陷阱。
Genome Biol. 2020 Nov 19;21(1):282. doi: 10.1186/s13059-020-02177-y.
6
Single-cell ATAC-seq signal extraction and enhancement with SCATE.利用 SCATE 进行单细胞 ATAC-seq 信号提取和增强。
Genome Biol. 2020 Jul 3;21(1):161. doi: 10.1186/s13059-020-02075-3.
7
PRIME: a probabilistic imputation method to reduce dropout effects in single-cell RNA sequencing.PRIME:一种用于减少单细胞 RNA 测序中数据丢失影响的概率插补方法。
Bioinformatics. 2020 Jul 1;36(13):4021-4029. doi: 10.1093/bioinformatics/btaa278.
8
Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples.完成 ENCODE3 纲要可在各种检测和人类生物样本中实现准确的推断。
Genome Biol. 2020 Mar 30;21(1):82. doi: 10.1186/s13059-020-01978-5.
9
Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome.鳄梨:一种多尺度深度张量分解方法,可学习人类表观基因组的潜在表示。
Genome Biol. 2020 Mar 30;21(1):81. doi: 10.1186/s13059-020-01977-6.
10
Computational identification of cell-specific variable regions in ChIP-seq data.计算鉴定 ChIP-seq 数据中的细胞特异性可变区。
Nucleic Acids Res. 2020 May 21;48(9):e53. doi: 10.1093/nar/gkaa180.