Department of Fruit Tree Sciences, Institute of Plant Sciences, Agricultural Research Organization, Volcani Center, Rishon Lezion, Israel.
Bioinformatics. 2017 Jul 1;33(13):2053-2055. doi: 10.1093/bioinformatics/btx116.
A pre-requisite to clustering noisy data, such as gene-expression data, is the filtering step. As an alternative to this step, the ctsGE R-package applies a sorting step in which all of the data are divided into small groups. The groups are divided according to how the time points are related to the time-series median. Then clustering is performed separately on each group. Thus, the clustering is done in two steps. First, an expression index (i.e. a sequence of 1, -1 and 0) is defined and genes with the same index are grouped together, and then each group of genes is clustered by k-means to create subgroups. The ctsGE package also provides an interactive tool to visualize and explore the gene-expression patterns and their subclusters. ctsGE proposes a way of organizing and exploring expression data without eliminating valuable information.
Freely available as part of the Bioconductor project at https://bioconductor.org/packages/ctsGE/ .
Supplementary data are available at Bioinformatics online.
对嘈杂数据(如基因表达数据)进行聚类的前提是过滤步骤。ctsGE R 包作为此步骤的替代方法,应用了排序步骤,其中所有数据都分为小的组。根据时间点与时间序列中位数的关系对组进行划分。然后分别对每个组进行聚类。因此,聚类分为两步。首先,定义表达指数(即 1、-1 和 0 的序列),并将具有相同指数的基因组合在一起,然后通过 k-means 对每组基因进行聚类以创建子群。ctsGE 包还提供了一个交互式工具,用于可视化和探索基因表达模式及其子群。ctsGE 提出了一种无需消除有价值信息即可组织和探索表达数据的方法。
可在 https://bioconductor.org/packages/ctsGE/ 作为 Bioconductor 项目的一部分免费获得。
补充数据可在 Bioinformatics 在线获得。