Ohta Tomoko, Pyysalo Sampo, Miwa Makoto, Tsujii Jun'ichi
Department of Computer Science, University of Tokyo, Tokyo, Japan.
J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S2. doi: 10.1186/2041-1480-2-S5-S2.
We consider the task of automatically extracting DNA methylation events from the biomedical domain literature. DNA methylation is a key mechanism of epigenetic control of gene expression and implicated in many cancers, but there has been little study of automatic information extraction for DNA methylation.
We present an annotation scheme for DNA methylation following the representation of the BioNLP shared task on event extraction, select a set of 200 abstracts including a representative sample of all PubMed citations relevant to DNA methylation, and introduce manual annotation for this corpus marking nearly 3000 gene/protein mentions and 1500 DNA methylation and demethylation events. We retrain a state-of-the-art event extraction system on the corpus and find that automatic extraction of DNA methylation events, the methylated genes, and their methylation sites can be performed at 78% precision and 76% recall.
Our results demonstrate that reliable extraction methods for DNA methylation events can be created through corpus annotation and straightforward retraining of a general event extraction system. The introduced resources are freely available for use in research from the GENIA project homepage http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA.
我们考虑从生物医学领域文献中自动提取DNA甲基化事件的任务。DNA甲基化是基因表达表观遗传调控的关键机制,与许多癌症相关,但针对DNA甲基化的自动信息提取研究较少。
我们根据生物自然语言处理(BioNLP)事件提取共享任务的表示方法,提出了一种DNA甲基化注释方案,选择了一组200篇摘要,其中包括所有与DNA甲基化相关的PubMed引用文献的代表性样本,并对该语料库进行人工注释,标注了近3000个基因/蛋白质提及以及1500个DNA甲基化和去甲基化事件。我们在该语料库上重新训练了一个最先进的事件提取系统,发现DNA甲基化事件、甲基化基因及其甲基化位点的自动提取精度可达78%,召回率可达76%。
我们的结果表明,通过语料库注释和对通用事件提取系统进行直接重新训练,可以创建可靠的DNA甲基化事件提取方法。所引入的资源可从GENIA项目主页http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA免费获取用于研究。