Miozzi Laura, Piro Rosario Michael, Rosa Fabio, Ala Ugo, Silengo Lorenzo, Di Cunto Ferdinando, Provero Paolo
Institute of Plant Virology, CNR, Turin, Italy.
PLoS One. 2008 Jun 18;3(6):e2439. doi: 10.1371/journal.pone.0002439.
High-throughput gene expression data can predict gene function through the "guilt by association" principle: coexpressed genes are likely to be functionally associated.
METHODOLOGY/PRINCIPAL FINDINGS: We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE) and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG), small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin.
CONCLUSIONS/SIGNIFICANCE: We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several genetic diseases of unknown molecular basis.
高通量基因表达数据可通过“关联有罪”原则预测基因功能:共表达的基因可能在功能上相关联。
方法/主要发现:我们分析了公开可得的正常人组织表达数据。该分析基于两个实验平台(微阵列和SAGE)获得的数据以及表达谱之间各种差异度量的整合。该程序的构建单元是排名共表达组(RCG),即一小套紧密共表达的基因,对其进行功能注释分析。通过多数规则选择具有功能特征的RCG,并用于预测新的功能注释。具有功能特征的RCG在与相似表型相关的基因组中富集。我们利用这一事实为许多分子起源未知的OMIM表型寻找新的候选疾病基因。
结论/意义:我们预测了许多人类基因的新功能注释,表明不同数据集和共表达度量的整合显著扩大了结果范围。结合基因表达数据、功能注释和已知的表型-基因关联,我们为几种分子基础未知的遗传疾病提供了候选基因。