Suppr超能文献

MeInfoText 2.0:从生物医学文献中提取基因甲基化与癌症的关系。

MeInfoText 2.0: gene methylation and cancer relation extraction from biomedical literature.

机构信息

Institute of Molecular and Cellular Biology, National Taiwan University, Taipei, Taiwan.

出版信息

BMC Bioinformatics. 2011 Dec 14;12:471. doi: 10.1186/1471-2105-12-471.

Abstract

BACKGROUND

DNA methylation is regarded as a potential biomarker in the diagnosis and treatment of cancer. The relations between aberrant gene methylation and cancer development have been identified by a number of recent scientific studies. In a previous work, we used co-occurrences to mine those associations and compiled the MeInfoText 1.0 database. To reduce the amount of manual curation and improve the accuracy of relation extraction, we have now developed MeInfoText 2.0, which uses a machine learning-based approach to extract gene methylation-cancer relations.

DESCRIPTION

Two maximum entropy models are trained to predict if aberrant gene methylation is related to any type of cancer mentioned in the literature. After evaluation based on 10-fold cross-validation, the average precision/recall rates of the two models are 94.7/90.1 and 91.8/90% respectively. MeInfoText 2.0 provides the gene methylation profiles of different types of human cancer. The extracted relations with maximum probability, evidence sentences, and specific gene information are also retrievable. The database is available at http://bws.iis.sinica.edu.tw:8081/MeInfoText2/.

CONCLUSION

The previous version, MeInfoText, was developed by using association rules, whereas MeInfoText 2.0 is based on a new framework that combines machine learning, dictionary lookup and pattern matching for epigenetics information extraction. The results of experiments show that MeInfoText 2.0 outperforms existing tools in many respects. To the best of our knowledge, this is the first study that uses a hybrid approach to extract gene methylation-cancer relations. It is also the first attempt to develop a gene methylation and cancer relation corpus.

摘要

背景

DNA 甲基化被视为癌症诊断和治疗的潜在生物标志物。许多最近的科学研究已经确定了异常基因甲基化与癌症发展之间的关系。在之前的工作中,我们使用共现关系挖掘这些关联,并编译了 MeInfoText 1.0 数据库。为了减少人工校对的工作量并提高关系提取的准确性,我们现在开发了 MeInfoText 2.0,它使用基于机器学习的方法来提取基因甲基化-癌症关系。

描述

两个最大熵模型被训练来预测异常基因甲基化是否与文献中提到的任何类型的癌症有关。经过基于 10 倍交叉验证的评估,两个模型的平均精度/召回率分别为 94.7/90.1 和 91.8/90%。MeInfoText 2.0 提供了不同类型人类癌症的基因甲基化谱。还可以检索具有最大概率、证据句子和特定基因信息的提取关系。该数据库可在 http://bws.iis.sinica.edu.tw:8081/MeInfoText2/ 获得。

结论

以前的版本 MeInfoText 是使用关联规则开发的,而 MeInfoText 2.0 则基于一种新的框架,该框架结合了机器学习、字典查找和模式匹配,用于提取表观遗传学信息。实验结果表明,MeInfoText 2.0 在许多方面优于现有工具。据我们所知,这是首次使用混合方法提取基因甲基化-癌症关系的研究。这也是首次尝试开发基因甲基化和癌症关系语料库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b2f/3266364/ccc410df293f/1471-2105-12-471-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验