MIMIC：一种用于鉴定细胞分选的细胞类型特异性标记组合的优化方法。

MIMIC: an optimization method to identify cell type-specific marker panel for cell sorting.

机构信息

Department of Mathematics, Huazhong University of Science and Technology, Beijing 100190, China.

Department of Genetics and Biochemistry, Clemson University, Beijing 100190, China.

出版信息

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab235.

DOI:10.1093/bib/bbab235

PMID:34180954

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8575015/

Abstract

Multi-omics data allow us to select a small set of informative markers for the discrimination of specific cell types and study of cellular heterogeneity. However, it is often challenging to choose an optimal marker panel from the high-dimensional molecular profiles for a large amount of cell types. Here, we propose a method called Mixed Integer programming Model to Identify Cell type-specific marker panel (MIMIC). MIMIC maintains the hierarchical topology among different cell types and simultaneously maximizes the specificity of a fixed number of selected markers. MIMIC was benchmarked on the mouse ENCODE RNA-seq dataset, with 29 diverse tissues, for 43 surface markers (SMs) and 1345 transcription factors (TFs). MIMIC could select biologically meaningful markers and is robust for different accuracy criteria. It shows advantages over the standard single gene-based approaches and widely used dimensional reduction methods, such as multidimensional scaling and t-SNE, both in accuracy and in biological interpretation. Furthermore, the combination of SMs and TFs achieves better specificity than SMs or TFs alone. Applying MIMIC to a large collection of 641 RNA-seq samples covering 231 cell types identifies a panel of TFs and SMs that reveal the modularity of cell type association networks. Finally, the scalability of MIMIC is demonstrated by selecting enhancer markers from mouse ENCODE data. MIMIC is freely available at https://github.com/MengZou1/MIMIC.

摘要

多组学数据使我们能够选择一小部分信息丰富的标记物，用于区分特定的细胞类型和研究细胞异质性。然而，从大量细胞类型的高维分子谱中选择最佳的标记物组合通常具有挑战性。在这里，我们提出了一种称为混合整数规划模型来识别细胞类型特异性标记物组合（MIMIC）的方法。MIMIC 保持了不同细胞类型之间的层次拓扑结构，同时最大化了固定数量选定标记物的特异性。MIMIC 在小鼠 ENCODE RNA-seq 数据集上进行了基准测试，该数据集包含 29 种不同的组织，用于 43 个表面标记物（SMs）和 1345 个转录因子（TFs）。MIMIC 可以选择具有生物学意义的标记物，并且对于不同的准确性标准具有鲁棒性。它在准确性和生物学解释方面都优于标准的基于单个基因的方法和广泛使用的降维方法，如多维缩放和 t-SNE。此外，SMs 和 TFs 的组合比单独使用 SMs 或 TFs 具有更好的特异性。将 MIMIC 应用于包含 231 种细胞类型的 641 个 RNA-seq 样本的大型集合中，鉴定出一组 TFs 和 SMs，这些标记物揭示了细胞类型关联网络的模块化。最后，通过从小鼠 ENCODE 数据中选择增强子标记物，证明了 MIMIC 的可扩展性。MIMIC 可在 https://github.com/MengZou1/MIMIC 上免费获得。