Wren Jonathan D, Garner Harold R
Advanced Center for Genome Technology, Department of Botany and Microbiology, The University of Oklahoma, 620 Parrington Oval, Rm. 106, Norman, OK 73019, USA.
Bioinformatics. 2004 Jan 22;20(2):191-8. doi: 10.1093/bioinformatics/btg390.
There is a general scientific need to be able to identify and evaluate what any given set of 'objects' (e.g. genes, phenotypes, chemicals, diseases) has in common. Whether it is to classify, expand upon or identify commonalities and functional groupings, informational needs can be diverse and the best source to identify relationships among a potentially heterogeneous set of objects is the scientific literature.
We first establish a network of related objects by their co-occurrence within MEDLINE records. A set of objects within this network can then be queried to identify shared relationships, and a method is presented to score their statistical relevance by comparing observed frequencies with what would be expected in a random network model. Using Gene Ontology (GO) categories, we demonstrate that this method enables a quantitative ranking of the 'cohesiveness' of a set of objects and, importantly, allows other objects related to this set to be identified and evaluated for their 'cohesion' to it. Supplemental information: A list of ranked genes related to each GO category analyzed can be found at http://innovation.swmed.edu/IRIDESCENT/GO_relationships.htm
能够识别和评估任何给定的“对象”集(例如基因、表型、化学物质、疾病)的共同之处是一种普遍的科学需求。无论是进行分类、拓展还是识别共性和功能分组,信息需求可能多种多样,而识别潜在异质对象集之间关系的最佳来源是科学文献。
我们首先通过它们在MEDLINE记录中的共现来建立相关对象的网络。然后可以查询该网络中的一组对象以识别共享关系,并提出一种方法,通过将观察到的频率与随机网络模型中预期的频率进行比较来对它们的统计相关性进行评分。使用基因本体(GO)类别,我们证明该方法能够对一组对象的“凝聚性”进行定量排名,并且重要的是,允许识别与该组相关的其他对象并评估它们与该组的“凝聚性”。补充信息:与每个分析的GO类别相关的排名基因列表可在http://innovation.swmed.edu/IRIDESCENT/GO_relationships.htm找到