Semantic Software Lab, Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada.
BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S10. doi: 10.1186/1471-2164-13-S4-S10.
Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities.
We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner.
We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions.
突变作为进化的源头,长期以来一直是生物医学文献关注的焦点。获取突变信息及其对蛋白质性质的影响,有助于酶学和药理学等各个领域的研究。然而,人工整理生物医学文献这一丰富且快速增长的知识库既昂贵又耗时。作为一种解决方案,文本挖掘方法越来越多地应用于生物医学领域。虽然现有系统已经很好地检测到单点突变,但在将影响定位到各自的突变以及识别受影响的蛋白质性质方面仍存在挑战,特别是动力学和稳定性性质以及物理量。
我们提出了一个突变影响的本体模型,以及一个全面的文本挖掘系统,用于从全文文章中提取和分析突变影响信息。生物体作为蛋白质的来源被提取出来,以帮助基因和蛋白质的歧义消解。我们的系统然后使用新的启发式方法检测突变系列,以正确地定位检测到的影响。它还提取受影响的蛋白质性质,特别是动力学和稳定性性质,以及影响的幅度,并根据域本体验证这些关系。我们系统的输出可以以各种格式提供,特别是通过填充 OWL-DL 本体,然后可以查询该本体以提供结构化信息。我们的系统在我们手动注释的语料库上进行了评估。在影响检测任务中,我们的系统达到了 70.4%-71.1%的精度、71.3%-71.5%的召回率和 76.5%-77%的准确性,用于定位检测到的影响。开发的系统包括资源、评估数据以及面向终端用户和开发人员的文档,根据开源许可证可在 http://www.semanticsoftware.info/open-mutation-miner 上免费获得。
我们提出了 Open Mutation Miner (OMM),这是第一个全面的、完全开源的方法,用于从生物医学文献中自动提取影响和相关信息。我们在手动注释语料库上评估了我们的工作性能,结果表明我们的方法是可靠的。将提取信息表示为结构化格式有助于知识管理,并有助于数据库的整理和纠正。此外,通过多个接口提供对分析结果的访问,包括用于自动数据集成的 Web 服务和用于终端用户交互的桌面解决方案。