Hu Yongli, Hase Takeshi, Li Hui Peng, Prabhakar Shyam, Kitano Hiroaki, Ng See Kiong, Ghosh Samik, Wee Lawrence Jin Kiat
Institute for Infocomm Research, A*STAR, 1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore, Singapore.
The Systems Biology Institute, Singapore Node hosted at the Institute for Infocomm Research, A*STAR, Singapore, Singapore.
BMC Genomics. 2016 Dec 22;17(Suppl 13):1025. doi: 10.1186/s12864-016-3317-7.
The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)).
Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases.
This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.
使用单细胞RNA测序技术对单细胞转录组进行测序的能力,标志着科学范式的转变,科学家现在能够一次一个地同时研究异质细胞群体的复杂生物学特性。然而,迄今为止,还没有一种合适的计算方法来分析如此复杂的数据洪流,特别是有助于识别不同细胞亚型之间独特转录组谱差异的技术。在本文中,我们描述了一种使用机器学习算法(支持向量机(SVM)和随机森林(RF))分析从新皮层细胞和神经祖细胞获得的单细胞RNA测序数据的新方法。
使用基于支持向量机的递归特征消除(SVM-RFE)特征选择方法,在构建的支持向量机和随机森林分类器中,识别出38个关键转录本,以最佳区分发育中的新皮层细胞和神经祖细胞。此外,与常用的统计技术或基于基因集的方法相比,这些基因具有更高的判别能力(提高了预测准确性)。进一步进行了下游网络重建分析,以揭示隐藏的通用调控网络,其中新的相互作用可以在网络实验室实验中进一步验证,并且是治疗神经元发育疾病的有用靶点。
本文报道的这种新方法能够识别与神经元相关的转录本,这些转录本能够最佳地区分新皮层细胞和神经祖细胞。它被认为是可扩展的,适用于其他单细胞RNA测序表达谱,如高度异质性肿瘤内癌症进展和治疗的研究。