Liang Shaoheng, Mohanty Vakul, Dou Jinzhuang, Miao Qi, Huang Yuefan, Müftüoğlu Muharrem, Ding Li, Peng Weiyi, Chen Ken
Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA.
Department of Computer Science, Rice University, Houston, Texas, 77005, USA.
Nat Comput Sci. 2021 May;1(5):374-384. doi: 10.1038/s43588-021-00070-7. Epub 2021 May 20.
A key challenge in studying organisms and diseases is to detect rare molecular programs and rare cell populations (RCPs) that drive development, differentiation, and transformation. Molecular features such as genes and proteins defining RCPs are often unknown and difficult to detect from unenriched single-cell data, using conventional dimensionality reduction and clustering-based approaches. Here, we propose an unsupervised approach, SCMER (Single-Cell Manifold presERving feature selection), which selects a compact set of molecular features with definitive meanings that preserve the manifold of the data. We applied SCMER in the context of hematopoiesis, lymphogenesis, tumorigenesis, and drug resistance and response. We found that SCMER can identify non-redundant features that sensitively delineate both common cell lineages and rare cellular states. SCMER can be used for discovering molecular features in a high dimensional dataset, designing targeted, cost-effective assays for clinical applications, and facilitating multi-modality integration.
研究生物体和疾病的一个关键挑战是检测驱动发育、分化和转化的罕见分子程序和罕见细胞群体(RCP)。定义RCP的基因和蛋白质等分子特征通常未知,并且使用传统的降维和基于聚类的方法,从未富集的单细胞数据中很难检测到。在这里,我们提出了一种无监督方法SCMER(单细胞流形保留特征选择),它选择一组具有明确意义的紧凑分子特征,以保留数据的流形。我们将SCMER应用于造血、淋巴细胞生成、肿瘤发生以及耐药性和反应的研究中。我们发现SCMER可以识别非冗余特征,这些特征能够灵敏地描绘常见细胞谱系和罕见细胞状态。SCMER可用于在高维数据集中发现分子特征,设计用于临床应用的靶向、经济高效的检测方法,并促进多模态整合。