Minakuchi Yohei, Ito Masahiro, Kohara Yuji
Genome Biology Laboratory, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan.
Bioinformatics. 2004 May 1;20(7):1097-109. doi: 10.1093/bioinformatics/bth045. Epub 2004 Feb 5.
A comprehensive gene expression database is essential for computer modeling and simulation of biological phenomena, including development. Development is a four-dimensional (4D; 3D structure and time course) phenomenon. We are constructing a 4D database of gene expression for the early embryogenesis of the nematode Caenorhabditis elegans. As a framework of the 4D database, we have constructed computer graphics (CG), into which we will incorporate the expression data of a number of genes at the subcellular level. However, the assignment of 3D distribution of gene products (protein, mRNA), of embryos at various developmental stages, is both difficult and tedious. We need to automate this process. For this purpose, we developed a new system, named SPI after superimposing fluorescent confocal microscopic data onto a CG framework.
The scheme of this system comprises the following: (1) acquirement of serial sections (40 slices) of fluorescent confocal images of three colors (4',6'-diamino-2-phenylindole (DAPI) for nuclei, indodicarbocyanine (Cy-3) for the internal marker, which is a germline-specific protein POS-1 and indocarbocyanine (Cy-5) for the gene product to be examined); (2) identification of several features of the stained embryos, such as contour, developmental stage and position of the internal marker; (3) selection of CG images of the corresponding stage for template matching; (4) superimposition of serial sections onto the CG; (5) assignment of the position of superimposed gene products. The Snakes algorithm identified the embryo contour. The detection accuracy of embryo contours was 92.1% when applied to 2- to 28-cell-stage embryos. The accuracy of the developmental stage prediction method was 81.2% for 2- to 8-cell-stage embryos. We manually judged only the later stage embryos because the accuracy for embryos at the later stages was unsatisfactory due to experimental noise effects. Finally, our system chose the optimal CG and performed the superposition and assignment of gene product distribution. We established an initial 4D gene expression database with 56 maternal gene products.
This system is available at http://anti.lab.nig.ac.jp/spi/ and http://anti.lab.nig.ac.jp/4ddb/
一个全面的基因表达数据库对于生物现象(包括发育)的计算机建模和模拟至关重要。发育是一种四维(4D;三维结构和时间进程)现象。我们正在构建一个关于秀丽隐杆线虫早期胚胎发育的基因表达4D数据库。作为该4D数据库的框架,我们构建了计算机图形(CG),并将在亚细胞水平上纳入多个基因的表达数据。然而,确定处于不同发育阶段胚胎的基因产物(蛋白质、mRNA)的三维分布既困难又繁琐。我们需要使这个过程自动化。为此,我们开发了一个新系统,在将荧光共聚焦显微镜数据叠加到CG框架后,将其命名为SPI。
该系统的方案包括以下内容:(1)获取三种颜色(用于细胞核的4',6'-二脒基-2-苯基吲哚(DAPI)、用于内部标记物(一种生殖系特异性蛋白质POS-1)的吲哚羰花青(Cy-3)以及用于待检测基因产物的吲哚羰花青(Cy-5))的荧光共聚焦图像的连续切片(40片);(2)识别染色胚胎的几个特征,如轮廓、发育阶段和内部标记物的位置;(3)选择相应阶段的CG图像进行模板匹配;(4)将连续切片叠加到CG上;(5)确定叠加的基因产物的位置。Snakes算法识别胚胎轮廓。当应用于2至28细胞期胚胎时,胚胎轮廓的检测准确率为92.1%。对于2至8细胞期胚胎,发育阶段预测方法的准确率为81.2%。由于实验噪声的影响,后期胚胎的准确率不令人满意,因此我们仅人工判断后期胚胎。最后,我们的系统选择了最佳的CG并进行了基因产物分布的叠加和确定。我们建立了一个包含56种母体基因产物的初始4D基因表达数据库。
该系统可在http://anti.lab.nig.ac.jp/spi/和http://anti.lab.nig.ac.jp/4ddb/获取。