Samsonova Anastasia A, Niranjan Mahesan, Russell Steven, Brazma Alvis
European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.
PLoS Comput Biol. 2007 Jul;3(7):e144. doi: 10.1371/journal.pcbi.0030144.
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.
理解基因集如何在空间和时间上被协调调控,从而产生构成复杂后生动物特征的细胞类型多样性,是现代生物学面临的一项重大挑战。高通量方法的应用,如大规模原位杂交和通过DNA微阵列进行全基因组表达谱分析,开始为我们洞察发育的复杂性提供线索。然而,在许多生物体中,收集和注释全面的原位定位数据是一项困难且耗时的任务。在此,我们提出一种广泛适用的计算方法,将发育时间进程微阵列数据与注释的原位杂交研究相结合,该方法有助于对尚无体内基因表达定位数据的基因进行组织特异性表达的从头预测。我们采用一种分类方法,用果蝇胚胎发育期间基因表达的微阵列和原位杂交研究数据进行训练,对尚未通过原位杂交实验进行系统表征的果蝇基因的组织特异性表达进行了一系列预测。我们的预测可靠性通过FlyBase中源自文献的注释、基因本体生物学过程注释的过度富集以及在一组选定基因中通过文献中的详细基因特异性研究得到了证实。我们这种新颖的不依赖生物体的方法,在丰富复杂多细胞生物体中基因功能和表达的注释方面将具有相当大的实用价值。