Erdogmus Muge, Sezerman Osman Ugur
Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey.
J Bioinform Comput Biol. 2007 Dec;5(6):1261-75. doi: 10.1142/s021972000700317x.
To have a better understanding of the mechanisms of disease development, knowledge of mutations and the genes on which the mutations occur is of crucial importance. Information on disease-related mutations can be accessed through public databases or biomedical literature sources. However, information retrieval from such resources can be problematic because of two reasons: manually created databases are usually incomplete and not up to date, and reading through a vast amount of publicly available biomedical documents is very time-consuming. In this paper, we describe an automated system, MuGeX (Mutation Gene eXtractor), that automatically extracts mutation-gene pairs from Medline abstracts for a disease query. Our system is tested on a corpus that consists of 231 Medline abstracts. While recall for mutation detection alone is 85.9%, precision is 95.9%. For extraction of mutation-gene pairs, we focus on Alzheimer's disease. The recall for mutation-gene pair identification is estimated at 91.3%, and precision is estimated at 88.9%. With automatic extraction techniques, MuGeX overcomes the problems of information retrieval from public resources and reduces the time required to access relevant information, while preserving the accuracy of retrieved information.
为了更好地理解疾病发展的机制,了解突变以及发生突变的基因至关重要。有关疾病相关突变的信息可通过公共数据库或生物医学文献来源获取。然而,从这些资源中检索信息可能存在问题,原因有两个:手动创建的数据库通常不完整且不及时,并且通读大量公开可用的生物医学文档非常耗时。在本文中,我们描述了一个自动化系统MuGeX(突变基因提取器),它可以从Medline摘要中自动提取针对疾病查询的突变-基因对。我们的系统在一个由231篇Medline摘要组成的语料库上进行了测试。仅对于突变检测,召回率为85.9%,精确率为95.9%。对于突变-基因对的提取,我们重点关注阿尔茨海默病。突变-基因对识别的召回率估计为91.3%,精确率估计为88.9%。通过自动提取技术,MuGeX克服了从公共资源中检索信息的问题,减少了获取相关信息所需的时间,同时保持了检索信息的准确性。