Angeles-Albores David, N Lee Raymond Y, Chan Juancarlos, Sternberg Paul W
HHMI and California Institute of Technology, Division of Biology and Biological Engineering, 1200 E California Blvd, Pasadena, 91125, USA.
BMC Bioinformatics. 2016 Sep 13;17(1):366. doi: 10.1186/s12859-016-1229-9.
Over the last ten years, there has been explosive development in methods for measuring gene expression. These methods can identify thousands of genes altered between conditions, but understanding these datasets and forming hypotheses based on them remains challenging. One way to analyze these datasets is to associate ontologies (hierarchical, descriptive vocabularies with controlled relations between terms) with genes and to look for enrichment of specific terms. Although Gene Ontology (GO) is available for Caenorhabditis elegans, it does not include anatomical information.
We have developed a tool for identifying enrichment of C. elegans tissues among gene sets and generated a website GUI where users can access this tool. Since a common drawback to ontology enrichment analyses is its verbosity, we developed a very simple filtering algorithm to reduce the ontology size by an order of magnitude. We adjusted these filters and validated our tool using a set of 30 gold standards from Expression Cluster data in WormBase. We show our tool can even discriminate between embryonic and larval tissues and can even identify tissues down to the single-cell level. We used our tool to identify multiple neuronal tissues that are down-regulated due to pathogen infection in C. elegans.
Our Tissue Enrichment Analysis (TEA) can be found within WormBase, and can be downloaded using Python's standard pip installer. It tests a slimmed-down C. elegans tissue ontology for enrichment of specific terms and provides users with a text and graphic representation of the results.
在过去十年中,基因表达测量方法有了爆炸性的发展。这些方法可以识别出在不同条件下数千个发生改变的基因,但理解这些数据集并基于它们形成假设仍然具有挑战性。分析这些数据集的一种方法是将本体(具有术语间受控关系的分层描述性词汇表)与基因相关联,并寻找特定术语的富集情况。虽然秀丽隐杆线虫有基因本体(GO),但它不包括解剖学信息。
我们开发了一种用于识别秀丽隐杆线虫组织在基因集中富集情况的工具,并生成了一个网站图形用户界面,用户可以访问该工具。由于本体富集分析的一个常见缺点是其冗长性,我们开发了一种非常简单的过滤算法,将本体大小缩小了一个数量级。我们调整了这些过滤器,并使用来自WormBase中表达簇数据的一组30个金标准验证了我们的工具。我们表明我们的工具甚至可以区分胚胎组织和幼虫组织,甚至可以识别到单细胞水平的组织。我们使用我们的工具识别了秀丽隐杆线虫中因病原体感染而下调的多个神经元组织。
我们的组织富集分析(TEA)可以在WormBase中找到,并可以使用Python的标准pip安装程序下载。它测试精简后的秀丽隐杆线虫组织本体中特定术语的富集情况,并为用户提供结果的文本和图形表示。