Hennetin Jérôme, Bellis Michel
CRBM-CNRS, Montpellier, France.
Methods Enzymol. 2006;411:387-407. doi: 10.1016/S0076-6879(06)11021-6.
With the development of data set repositories, it is now possible to collate high numbers of related results by gathering data from experiments carried out in different laboratories and addressing similar questions or using a single type of biological material under different conditions. To address the challenge posed by the heterogeneous nature of multiple data sources, this chapter presents several methods used routinely for assessing the quality of data (i.e., reproducibility of replicates and similarity between experimental points obtained under identical or similar biological conditions). As gene clustering on large data sets is not straightforward, this chapter also presents a rapid gene clustering method that involves translating variation profiles from an ordered set of comparisons into chains of symbols. In addition, it shows that lists of genes assembled based on the presence of a common term in their functional description can be used to find the most informative comparisons and to construct from them exemplar chains of symbols that are useful for clustering similar genes. Finally, this symbolic approach is extended to the overall set of biological conditions under study and shows how the resultant collection of variation profiles can be used to construct transcriptional networks, which in turn can be used as powerful tools for gene clustering.