ctsGE 聚类表达数据的亚组。

ctsGE-clustering subgroups of expression data.

机构信息

Department of Fruit Tree Sciences, Institute of Plant Sciences, Agricultural Research Organization, Volcani Center, Rishon Lezion, Israel.

出版信息

Bioinformatics. 2017 Jul 1;33(13):2053-2055. doi: 10.1093/bioinformatics/btx116.

DOI:10.1093/bioinformatics/btx116

PMID:28334165

Abstract

SUMMARY

A pre-requisite to clustering noisy data, such as gene-expression data, is the filtering step. As an alternative to this step, the ctsGE R-package applies a sorting step in which all of the data are divided into small groups. The groups are divided according to how the time points are related to the time-series median. Then clustering is performed separately on each group. Thus, the clustering is done in two steps. First, an expression index (i.e. a sequence of 1, -1 and 0) is defined and genes with the same index are grouped together, and then each group of genes is clustered by k-means to create subgroups. The ctsGE package also provides an interactive tool to visualize and explore the gene-expression patterns and their subclusters. ctsGE proposes a way of organizing and exploring expression data without eliminating valuable information.

AVAILABILITY AND IMPLEMENTATION

Freely available as part of the Bioconductor project at https://bioconductor.org/packages/ctsGE/ .

CONTACT

ron@agri.gov.il.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

对嘈杂数据（如基因表达数据）进行聚类的前提是过滤步骤。ctsGE R 包作为此步骤的替代方法，应用了排序步骤，其中所有数据都分为小的组。根据时间点与时间序列中位数的关系对组进行划分。然后分别对每个组进行聚类。因此，聚类分为两步。首先，定义表达指数（即 1、-1 和 0 的序列），并将具有相同指数的基因组合在一起，然后通过 k-means 对每组基因进行聚类以创建子群。ctsGE 包还提供了一个交互式工具，用于可视化和探索基因表达模式及其子群。ctsGE 提出了一种无需消除有价值信息即可组织和探索表达数据的方法。