基因、主题与微阵列:利用信息检索进行大规模基因分析。
Genes, themes and microarrays: using information retrieval for large-scale gene analysis.
作者信息
Shatkay H, Edwards S, Wilbur W J, Boguski M
机构信息
National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland 20984, USA.
出版信息
Proc Int Conf Intell Syst Mol Biol. 2000;8:317-28.
The immense volume of data resulting from DNA microarray experiments, accompanied by an increase in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analysis of gene expression patterns. Clustering indeed reveals potentially meaningful relationships among genes, but can not explain the underlying biological mechanisms. In an attempt to address this problem, we have developed a new approach for utilizing the literature in order to establish functional relationships among genes on a genome-wide scale. Our method is based on revealing coherent themes within the literature, using a similarity-based search in document space. Content-based relationships among abstracts are then translated into functional connections among genes. We describe preliminary experiments applying our algorithm to a database of documents discussing yeast genes. A comparison of the produced results with well-established yeast gene functions demonstrates the effectiveness of our approach.
DNA微阵列实验产生的海量数据,以及讨论基因相关发现的出版物数量的增加,带来了重大的数据分析挑战。当前用于全基因组表达数据分析的方法通常依赖于基因表达模式的聚类分析。聚类确实揭示了基因之间潜在的有意义的关系,但无法解释潜在的生物学机制。为了解决这个问题,我们开发了一种新方法,利用文献在全基因组范围内建立基因之间的功能关系。我们的方法基于在文献空间中使用基于相似性的搜索来揭示文献中的连贯主题。然后将摘要之间基于内容的关系转化为基因之间的功能联系。我们描述了将我们的算法应用于讨论酵母基因的文档数据库的初步实验。将产生的结果与已确立的酵母基因功能进行比较,证明了我们方法的有效性。