Suppr超能文献

一种用于突变提取和结构注释的工作流程。

A workflow for mutation extraction and structure annotation.

作者信息

Kanagasabai Rajaraman, Choo Khar Heng, Ranganathan Shoba, Baker Christopher J O

机构信息

Department of Data Mining, Institute for Infocomm Research, Singapore, Singapore.

出版信息

J Bioinform Comput Biol. 2007 Dec;5(6):1319-37. doi: 10.1142/s0219720007003119.

Abstract

Rich information on point mutation studies is scattered across heterogeneous data sources. This paper presents an automated workflow for mining mutation annotations from full-text biomedical literature using natural language processing (NLP) techniques as well as for their subsequent reuse in protein structure annotation and visualization. This system, called mSTRAP (Mutation extraction and STRucture Annotation Pipeline), is designed for both information aggregation and subsequent brokerage of the mutation annotations. It facilitates the coordination of semantically related information from a series of text mining and sequence analysis steps into a formal OWL-DL ontology. The ontology is designed to support application-specific data management of sequence, structure, and literature annotations that are populated as instances of object and data type properties. mSTRAPviz is a subsystem that facilitates the brokerage of structure information and the associated mutations for visualization. For mutated sequences without any corresponding structure available in the Protein Data Bank (PDB), an automated pipeline for homology modeling is developed to generate the theoretical model. With mSTRAP, we demonstrate a workable system that can facilitate automation of the workflow for the retrieval, extraction, processing, and visualization of mutation annotations -- tasks which are well known to be tedious, time-consuming, complex, and error-prone. The ontology and visualization tool are available at (http://datam.i2r.a-star.edu.sg/mstrap).

摘要

关于点突变研究的丰富信息分散在各种异构数据源中。本文提出了一种自动化工作流程,用于使用自然语言处理(NLP)技术从全文生物医学文献中挖掘突变注释,并将其随后重新用于蛋白质结构注释和可视化。这个名为mSTRAP(突变提取和结构注释管道)的系统旨在进行信息聚合以及随后对突变注释的整合。它有助于将来自一系列文本挖掘和序列分析步骤的语义相关信息协调到一个正式的OWL-DL本体中。该本体旨在支持作为对象和数据类型属性实例填充的序列、结构和文献注释的特定应用数据管理。mSTRAPviz是一个子系统,有助于整合结构信息和相关突变以进行可视化。对于蛋白质数据库(PDB)中没有任何相应结构的突变序列,开发了一种自动同源建模管道以生成理论模型。通过mSTRAP,我们展示了一个可行的系统,该系统可以促进突变注释的检索、提取、处理和可视化工作流程的自动化,这些任务众所周知既繁琐、耗时、复杂又容易出错。本体和可视化工具可在(http://datam.i2r.a-star.edu.sg/mstrap)获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验