Xuan Weijian, Wang Pinglang, Watson Stanley J, Meng Fan
Molecular and Behavioral Neuroscience Institute and Department of Psychiatry, University of Michigan, Ann Arbor, Michigan 48109, USA.
Bioinformatics. 2007 Sep 15;23(18):2477-84. doi: 10.1093/bioinformatics/btm375. Epub 2007 Sep 6.
Genome-wide high density SNP association studies are expected to identify various SNP alleles associated with different complex disorders. Understanding the biological significance of these SNP alleles in the context of existing literature is a major challenge since existing search engines are not designed to search literature for SNPs or other genetic markers. The literature mining of gene and protein functions has received significant attention and effort while similar work on genetic markers and their related diseases is still in its infancy. Our goal is to develop a web-based tool that facilitates the mining of Medline literature related to genetic studies and gene/protein function studies. Our solution consists of four main function modules for (1) identification of different types of genetic markers or genetic variations in Medline records (2) distinguishing positive versus negative linkage or association between genetic markers and diseases (3) integrating marker genomic location data from different databases to enable the retrieval of Medline records related to markers in the same linkage disequilibrium region (4) and a web interface called MarkerInfoFinder to search, display, sort and download Medline citation results. Tests using published data suggest MarkerInfoFinder can significantly increase the efficiency of finding genetic disorders and their underlying molecular mechanisms. The functions we developed will also be used to build a knowledge base for genetic markers and diseases.
The MarkerInfoFinder is publicly available at: http://brainarray.mbni.med.umich.edu/brainarray/datamining/MarkerInfoFinder.
全基因组高密度单核苷酸多态性(SNP)关联研究有望识别出与不同复杂疾病相关的各种SNP等位基因。鉴于现有的搜索引擎并非设计用于搜索有关SNP或其他遗传标记的文献,因此在现有文献背景下理解这些SNP等位基因的生物学意义是一项重大挑战。基因和蛋白质功能的文献挖掘已受到广泛关注并投入了大量精力,而关于遗传标记及其相关疾病的类似工作仍处于起步阶段。我们的目标是开发一种基于网络的工具,以促进对与基因研究和基因/蛋白质功能研究相关的Medline文献的挖掘。我们的解决方案包括四个主要功能模块,用于(1)识别Medline记录中不同类型的遗传标记或遗传变异;(2)区分遗传标记与疾病之间的正向与负向连锁或关联;(3)整合来自不同数据库的标记基因组位置数据,以便检索与处于相同连锁不平衡区域的标记相关的Medline记录;(4)以及一个名为MarkerInfoFinder的网络界面,用于搜索、显示、排序和下载Medline引用结果。使用已发表数据进行的测试表明,MarkerInfoFinder可以显著提高发现遗传疾病及其潜在分子机制的效率。我们开发的功能还将用于构建遗传标记和疾病的知识库。
MarkerInfoFinder可通过以下网址公开获取:http://brainarray.mbni.med.umich.edu/brainarray/datamining/MarkerInfoFinder 。