• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

杂项:单细胞RNA测序数据的缺失插补

MISC: missing imputation for single-cell RNA sequencing data.

作者信息

Yang Mary Qu, Weissman Sherman M, Yang William, Zhang Jialing, Canaann Allon, Guan Renchu

机构信息

Joint Bioinformatics Program, University of Arkansas Little Rock George Washington Donaghey College of Engineering & IT and University of Arkansas for Medical Sciences, Little Rock, AR, 72204, USA.

Department of Genetics, Yale University, New Haven, CT, 06512, USA.

出版信息

BMC Syst Biol. 2018 Dec 14;12(Suppl 7):114. doi: 10.1186/s12918-018-0638-y.

DOI:10.1186/s12918-018-0638-y
PMID:30547798
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6293493/
Abstract

BACKGROUND

Single-cell RNA sequencing (scRNA-seq) technology provides an effective way to study cell heterogeneity. However, due to the low capture efficiency and stochastic gene expression, scRNA-seq data often contains a high percentage of missing values. It has been showed that the missing rate can reach approximately 30% even after noise reduction. To accurately recover missing values in scRNA-seq data, we need to know where the missing data is; how much data is missing; and what are the values of these data.

METHODS

To solve these three problems, we propose a novel model with a hybrid machine learning method, namely, missing imputation for single-cell RNA-seq (MISC). To solve the first problem, we transformed it to a binary classification problem on the RNA-seq expression matrix. Then, for the second problem, we searched for the intersection of the classification results, zero-inflated model and false negative model results. Finally, we used the regression model to recover the data in the missing elements.

RESULTS

We compared the raw data without imputation, the mean-smooth neighbor cell trajectory, MISC on chronic myeloid leukemia data (CML), the primary somatosensory cortex and the hippocampal CA1 region of mouse brain cells. On the CML data, MISC discovered a trajectory branch from the CP-CML to the BC-CML, which provides direct evidence of evolution from CP to BC stem cells. On the mouse brain data, MISC clearly divides the pyramidal CA1 into different branches, and it is direct evidence of pyramidal CA1 in the subpopulations. In the meantime, with MISC, the oligodendrocyte cells became an independent group with an apparent boundary.

CONCLUSIONS

Our results showed that the MISC model improved the cell type classification and could be instrumental to study cellular heterogeneity. Overall, MISC is a robust missing data imputation model for single-cell RNA-seq data.

摘要

背景

单细胞RNA测序(scRNA-seq)技术为研究细胞异质性提供了一种有效方法。然而,由于捕获效率低和基因表达的随机性,scRNA-seq数据通常包含高比例的缺失值。研究表明,即使经过降噪处理,缺失率仍可达到约30%。为了准确恢复scRNA-seq数据中的缺失值,我们需要知道缺失数据的位置;缺失了多少数据;以及这些数据的值是什么。

方法

为了解决这三个问题,我们提出了一种采用混合机器学习方法的新型模型,即单细胞RNA-seq缺失值插补(MISC)。为了解决第一个问题,我们将其转化为RNA-seq表达矩阵上的二元分类问题。然后,对于第二个问题,我们寻找分类结果、零膨胀模型和假阴性模型结果的交集。最后利用回归模型恢复缺失元素中的数据。

结果

我们比较了未插补的原始数据、平均平滑的相邻细胞轨迹、慢性髓系白血病数据(CML)、小鼠脑初级体感皮层和海马CA1区细胞上的MISC。在CML数据上,MISC发现了一条从慢性期慢性髓系白血病(CP-CML)到急变期慢性髓系白血病(BC-CML)的轨迹分支,这为从CP干细胞向BC干细胞的进化提供了直接证据。在小鼠脑数据上,MISC清楚地将锥体CA1细胞分为不同分支,这是锥体CA1细胞亚群的直接证据。同时,使用MISC后,少突胶质细胞成为一个具有明显边界的独立组。

结论

我们的结果表明,MISC模型改善了细胞类型分类,有助于研究细胞异质性。总体而言,MISC是一种用于单细胞RNA-seq数据的强大缺失数据插补模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/e3a39977456b/12918_2018_638_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/bd1a5868b66f/12918_2018_638_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/7bd8000a7f96/12918_2018_638_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/f84a89a8a710/12918_2018_638_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/fa004ac72546/12918_2018_638_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/b8f12b6b8fa0/12918_2018_638_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/e3a39977456b/12918_2018_638_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/bd1a5868b66f/12918_2018_638_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/7bd8000a7f96/12918_2018_638_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/f84a89a8a710/12918_2018_638_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/fa004ac72546/12918_2018_638_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/b8f12b6b8fa0/12918_2018_638_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e65/6293493/e3a39977456b/12918_2018_638_Fig6_HTML.jpg

相似文献

1
MISC: missing imputation for single-cell RNA sequencing data.杂项:单细胞RNA测序数据的缺失插补
BMC Syst Biol. 2018 Dec 14;12(Suppl 7):114. doi: 10.1186/s12918-018-0638-y.
2
Model-based autoencoders for imputing discrete single-cell RNA-seq data.基于模型的自动编码器用于推断离散的单细胞 RNA-seq 数据。
Methods. 2021 Aug;192:112-119. doi: 10.1016/j.ymeth.2020.09.010. Epub 2020 Sep 22.
3
A flexible network-based imputing-and-fusing approach towards the identification of cell types from single-cell RNA-seq data.一种基于灵活网络的推断融合方法,用于从单细胞 RNA-seq 数据中识别细胞类型。
BMC Bioinformatics. 2020 Jun 11;21(1):240. doi: 10.1186/s12859-020-03547-w.
4
A novel f-divergence based generative adversarial imputation method for scRNA-seq data analysis.一种基于新型 f 散度的生成对抗式填补方法,用于 scRNA-seq 数据分析。
PLoS One. 2023 Nov 10;18(11):e0292792. doi: 10.1371/journal.pone.0292792. eCollection 2023.
5
Accurate and interpretable gene expression imputation on scRNA-seq data using IGSimpute.使用 IGSimpute 实现 scRNA-seq 数据的准确和可解释的基因表达推断。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad124.
6
Bubble: a fast single-cell RNA-seq imputation using an autoencoder constrained by bulk RNA-seq data.Bubble:一种利用受批量RNA测序数据约束的自动编码器进行的快速单细胞RNA测序插补方法。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac580.
7
AutoImpute: Autoencoder based imputation of single-cell RNA-seq data.AutoImpute:基于自动编码器的单细胞 RNA-seq 数据插补。
Sci Rep. 2018 Nov 5;8(1):16329. doi: 10.1038/s41598-018-34688-x.
8
Data Analysis in Single-Cell Transcriptome Sequencing.单细胞转录组测序中的数据分析
Methods Mol Biol. 2018;1754:311-326. doi: 10.1007/978-1-4939-7717-8_18.
9
An accurate and robust imputation method scImpute for single-cell RNA-seq data.一种用于单细胞 RNA-seq 数据的准确稳健的插补方法 scImpute。
Nat Commun. 2018 Mar 8;9(1):997. doi: 10.1038/s41467-018-03405-7.
10
cnnImpute: missing value recovery for single cell RNA sequencing data.cnnImpute:单细胞 RNA 测序数据的缺失值恢复。
Sci Rep. 2024 Feb 16;14(1):3946. doi: 10.1038/s41598-024-53998-x.

引用本文的文献

1
Multidimensional landscape of non-alcoholic fatty liver disease-related disease spectrum uncovered by big omics data: Profiling evidence and new perspectives.大组学数据揭示的非酒精性脂肪性肝病相关疾病谱的多维图景:剖析证据与新视角
Smart Med. 2023 Apr 17;2(2):e20220029. doi: 10.1002/SMMD.20220029. eCollection 2023 May.
2
Single-Cell Analysis of the Transcriptome and Epigenome.单细胞转录组和表观基因组分析。
Methods Mol Biol. 2022;2399:21-60. doi: 10.1007/978-1-0716-1831-8_3.
3
Correcting Bias in Allele Frequency Estimates Due to an Observation Threshold: A Markov Chain Analysis.

本文引用的文献

1
Single-cell transcriptomics uncovers distinct molecular signatures of stem cells in chronic myeloid leukemia.单细胞转录组学揭示慢性髓性白血病干细胞的独特分子特征。
Nat Med. 2017 Jun;23(6):692-702. doi: 10.1038/nm.4336. Epub 2017 May 15.
2
Revealing the vectors of cellular identity with single-cell genomics.利用单细胞基因组学揭示细胞身份的载体。
Nat Biotechnol. 2016 Nov 8;34(11):1145-1160. doi: 10.1038/nbt.3711.
3
FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data.FastProject:一种用于单细胞RNA测序数据低维分析的工具。
校正由于观测阈值导致的等位基因频率估计偏差:马尔可夫链分析。
Genome Biol Evol. 2022 Apr 10;14(4). doi: 10.1093/gbe/evac047.
4
Statistics or biology: the zero-inflation controversy about scRNA-seq data.统计学还是生物学:关于 scRNA-seq 数据的零膨胀争议。
Genome Biol. 2022 Jan 21;23(1):31. doi: 10.1186/s13059-022-02601-5.
5
Correlation between targeted RNAseq signature of breast cancer CTCs and onset of bone-only metastases.乳腺癌循环肿瘤细胞靶向 RNA 测序特征与仅骨转移发生的相关性。
Br J Cancer. 2022 Feb;126(3):419-429. doi: 10.1038/s41416-021-01481-z. Epub 2021 Jul 16.
6
Multiomic Big Data Analysis Challenges: Increasing Confidence in the Interpretation of Artificial Intelligence Assessments.多组学大数据分析挑战:提高对人工智能评估解读的信心。
Anal Chem. 2021 Jun 8;93(22):7763-7773. doi: 10.1021/acs.analchem.0c04850. Epub 2021 May 24.
7
An efficient ensemble method for missing value imputation in microarray gene expression data.一种用于微阵列基因表达数据中缺失值插补的有效集成方法。
BMC Bioinformatics. 2021 Apr 13;22(1):188. doi: 10.1186/s12859-021-04109-4.
8
Correction to: MISC: missing imputation for single-cell RNA sequencing data.对《MISC:单细胞RNA测序数据缺失插补》的修正
BMC Syst Biol. 2019 Jan 22;13(1):13. doi: 10.1186/s12918-019-0681-3.
BMC Bioinformatics. 2016 Aug 23;17(1):315. doi: 10.1186/s12859-016-1176-5.
4
Design and Analysis of Single-Cell Sequencing Experiments.单细胞测序实验的设计与分析。
Cell. 2015 Nov 5;163(4):799-810. doi: 10.1016/j.cell.2015.10.039.
5
Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis.急性髓系白血病的数据驱动表型剖析揭示了与预后相关的祖细胞样细胞。
Cell. 2015 Jul 2;162(1):184-97. doi: 10.1016/j.cell.2015.05.047. Epub 2015 Jun 18.
6
Advances and applications of single-cell sequencing technologies.单细胞测序技术的进展与应用
Mol Cell. 2015 May 21;58(4):598-609. doi: 10.1016/j.molcel.2015.05.005.
7
Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.利用纳升液滴对单个细胞进行高度并行的全基因组表达谱分析。
Cell. 2015 May 21;161(5):1202-1214. doi: 10.1016/j.cell.2015.05.002.
8
Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing.通过组合细胞索引对染色质可及性进行多重单细胞分析
Science. 2015 May 22;348(6237):910-4. doi: 10.1126/science.aab1601. Epub 2015 May 7.
9
SNES: single nucleus exome sequencing.SNES:单细胞核外显子组测序。
Genome Biol. 2015 Mar 25;16(1):55. doi: 10.1186/s13059-015-0616-2.
10
Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.脑结构。单细胞 RNA 测序揭示的小鼠皮层和海马中的细胞类型。
Science. 2015 Mar 6;347(6226):1138-42. doi: 10.1126/science.aaa1934. Epub 2015 Feb 19.