Jia Xingang, Liu Yisu, Han Qiuhong, Lu Zuhong
School of Mathematics Southeast University Nanjing China.
State Key Laboratory of Bioelectronics School of Biological Science and Medical Engineering Southeast University Nanjing China.
FEBS Open Bio. 2017 Nov 13;7(12):2008-2020. doi: 10.1002/2211-5463.12327. eCollection 2017 Dec.
Analysis of gene expression data by clustering and visualizing played a central role in obtaining biological knowledge. Here, we used Pearson's correlation coefficient of multiple-cumulative probabilities (PCC-MCP) of genes to define the similarity of gene expression behaviors. To answer the challenge of the high-dimensional MCPs, we used icc-cluster, a clustering algorithm that obtained solutions by iterating clustering centers, with PCC-MCP to group genes. We then used -statistic stochastic neighbor embedding (t-SNE) of KC-data to generate optimal maps for clusters of MCP (t-SNE-MCP-O maps). From the analysis of several transcriptome data sets, we demonstrated clear advantages for using icc-cluster with PCC-MCP over commonly used clustering methods. t-SNE-MCP-O was also shown to give clearly projecting boundaries for clusters of PCC-MCP, which made the relationships between clusters easy to visualize and understand.
通过聚类和可视化分析基因表达数据在获取生物学知识方面发挥了核心作用。在此,我们使用基因的多重累积概率的皮尔逊相关系数(PCC-MCP)来定义基因表达行为的相似性。为应对高维MCP的挑战,我们使用了icc-聚类算法,这是一种通过迭代聚类中心来获得解决方案的聚类算法,结合PCC-MCP对基因进行分组。然后,我们使用KC数据的 - 统计随机邻域嵌入(t-SNE)来生成MCP聚类的最优图谱(t-SNE-MCP-O图谱)。通过对多个转录组数据集的分析,我们证明了将icc-聚类与PCC-MCP结合使用相对于常用聚类方法具有明显优势。t-SNE-MCP-O还显示出能为PCC-MCP聚类给出清晰的投影边界,这使得聚类之间的关系易于可视化和理解。