School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton Street, Edinburgh EH8 9AB, UK.
Bioinformatics. 2011 Apr 15;27(8):1101-7. doi: 10.1093/bioinformatics/btr105. Epub 2011 Feb 25.
Deciphering the regulatory and developmental mechanisms for multicellular organisms requires detailed knowledge of gene interactions and gene expressions. The availability of large datasets with both spatial and ontological annotation of the spatio-temporal patterns of gene expression in mouse embryo provides a powerful resource to discover the biological function of embryo organization. Ontological annotation of gene expressions consists of labelling images with terms from the anatomy ontology for mouse development. If the spatial genes of an anatomical component are expressed in an image, the image is then tagged with a term of that anatomical component. The current annotation is done manually by domain experts, which is both time consuming and costly. In addition, the level of detail is variable, and inevitably errors arise from the tedious nature of the task. In this article, we present a new method to automatically identify and annotate gene expression patterns in the mouse embryo with anatomical terms.
The method takes images from in situ hybridization studies and the ontology for the developing mouse embryo, it then combines machine learning and image processing techniques to produce classifiers that automatically identify and annotate gene expression patterns in these images. We evaluate our method on image data from the EURExpress study, where we use it to automatically classify nine anatomical terms: humerus, handplate, fibula, tibia, femur, ribs, petrous part, scapula and head mesenchyme. The accuracy of our method lies between 70% and 80% with few exceptions. We show that other known methods have lower classification performance than ours. We have investigated the images misclassified by our method and found several cases where the original annotation was not correct. This shows our method is robust against this kind of noise.
The annotation result and the experimental dataset in the article can be freely accessed at http://www2.docm.mmu.ac.uk/STAFF/L.Han/geneannotation/.
Supplementary data are available at Bioinformatics online.
为了解多细胞生物的调控和发育机制,需要详细了解基因相互作用和基因表达。小鼠胚胎中基因表达的时空模式具有丰富的空间和本体论注释的大型数据集的出现,为发现胚胎组织的生物学功能提供了强大的资源。基因表达的本体论注释包括使用发育中的鼠标解剖本体论中的术语来标记图像。如果图像中存在某个解剖成分的空间基因表达,则会为该图像添加该解剖成分的术语。目前的注释是由领域专家手动完成的,既费时又费钱。此外,详细程度也各不相同,并且由于任务的繁琐性质,不可避免地会出现错误。在本文中,我们提出了一种新的方法,用于使用解剖学术语自动识别和注释小鼠胚胎中的基因表达模式。
该方法采用来自原位杂交研究和发育中鼠标胚胎本体论的图像,然后结合机器学习和图像处理技术,生成自动识别和注释这些图像中基因表达模式的分类器。我们在 EURExpress 研究的图像数据上评估了我们的方法,我们使用它自动分类九个解剖学术语:肱骨、手板、腓骨、胫骨、股骨、肋骨、岩部、肩胛骨和头部间质。我们方法的准确率在 70%到 80%之间,极少数情况下除外。我们表明,其他已知方法的分类性能低于我们的方法。我们研究了我们的方法错误分类的图像,并发现了一些原始注释不正确的情况。这表明我们的方法对这种噪声具有鲁棒性。
文章中的注释结果和实验数据集可在 http://www2.docm.mmu.ac.uk/STAFF/L.Han/geneannotation/ 上免费访问。
补充数据可在 Bioinformatics 在线获取。