MeInfoText 2.0：从生物医学文献中提取基因甲基化与癌症的关系。

MeInfoText 2.0: gene methylation and cancer relation extraction from biomedical literature.

机构信息

Institute of Molecular and Cellular Biology, National Taiwan University, Taipei, Taiwan.

出版信息

BMC Bioinformatics. 2011 Dec 14;12:471. doi: 10.1186/1471-2105-12-471.

DOI:10.1186/1471-2105-12-471

PMID:22168213

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3266364/

Abstract

BACKGROUND

DNA methylation is regarded as a potential biomarker in the diagnosis and treatment of cancer. The relations between aberrant gene methylation and cancer development have been identified by a number of recent scientific studies. In a previous work, we used co-occurrences to mine those associations and compiled the MeInfoText 1.0 database. To reduce the amount of manual curation and improve the accuracy of relation extraction, we have now developed MeInfoText 2.0, which uses a machine learning-based approach to extract gene methylation-cancer relations.

DESCRIPTION

Two maximum entropy models are trained to predict if aberrant gene methylation is related to any type of cancer mentioned in the literature. After evaluation based on 10-fold cross-validation, the average precision/recall rates of the two models are 94.7/90.1 and 91.8/90% respectively. MeInfoText 2.0 provides the gene methylation profiles of different types of human cancer. The extracted relations with maximum probability, evidence sentences, and specific gene information are also retrievable. The database is available at http://bws.iis.sinica.edu.tw:8081/MeInfoText2/.

CONCLUSION

The previous version, MeInfoText, was developed by using association rules, whereas MeInfoText 2.0 is based on a new framework that combines machine learning, dictionary lookup and pattern matching for epigenetics information extraction. The results of experiments show that MeInfoText 2.0 outperforms existing tools in many respects. To the best of our knowledge, this is the first study that uses a hybrid approach to extract gene methylation-cancer relations. It is also the first attempt to develop a gene methylation and cancer relation corpus.

摘要

背景

DNA 甲基化被视为癌症诊断和治疗的潜在生物标志物。许多最近的科学研究已经确定了异常基因甲基化与癌症发展之间的关系。在之前的工作中，我们使用共现关系挖掘这些关联，并编译了 MeInfoText 1.0 数据库。为了减少人工校对的工作量并提高关系提取的准确性，我们现在开发了 MeInfoText 2.0，它使用基于机器学习的方法来提取基因甲基化-癌症关系。

描述

两个最大熵模型被训练来预测异常基因甲基化是否与文献中提到的任何类型的癌症有关。经过基于 10 倍交叉验证的评估，两个模型的平均精度/召回率分别为 94.7/90.1 和 91.8/90%。MeInfoText 2.0 提供了不同类型人类癌症的基因甲基化谱。还可以检索具有最大概率、证据句子和特定基因信息的提取关系。该数据库可在 http://bws.iis.sinica.edu.tw:8081/MeInfoText2/ 获得。

结论

以前的版本 MeInfoText 是使用关联规则开发的，而 MeInfoText 2.0 则基于一种新的框架，该框架结合了机器学习、字典查找和模式匹配，用于提取表观遗传学信息。实验结果表明，MeInfoText 2.0 在许多方面优于现有工具。据我们所知，这是首次使用混合方法提取基因甲基化-癌症关系的研究。这也是首次尝试开发基因甲基化和癌症关系语料库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b2f/3266364/ccc410df293f/1471-2105-12-471-1.jpg

相似文献

MeInfoText 2.0: gene methylation and cancer relation extraction from biomedical literature.MeInfoText 2.0：从生物医学文献中提取基因甲基化与癌症的关系。

BMC Bioinformatics. 2011 Dec 14;12:471. doi: 10.1186/1471-2105-12-471.

MeInfoText: associated gene methylation and cancer information from text mining.MeInfoText：来自文本挖掘的相关基因甲基化与癌症信息。

BMC Bioinformatics. 2008 Jan 14;9:22. doi: 10.1186/1471-2105-9-22.

BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature.BIOADI：一种用于识别生物文献中缩写词和定义的机器学习方法。

BMC Bioinformatics. 2009 Dec 3;10 Suppl 15(Suppl 15):S7. doi: 10.1186/1471-2105-10-S15-S7.

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.miRiaD：一种用于检测微小RNA与疾病关联的文本挖掘工具。

J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.

Linking the epigenome to the genome: correlation of different features to DNA methylation of CpG islands.将表观基因组与基因组联系起来：不同特征与 CpG 岛 DNA 甲基化的相关性。

PLoS One. 2012;7(4):e35327. doi: 10.1371/journal.pone.0035327. Epub 2012 Apr 30.

PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.PPInterFinder——一种从文献中提取人类蛋白质因果关系的挖掘工具。

Database (Oxford). 2013 Jan 15;2013:bas052. doi: 10.1093/database/bas052. Print 2013.

DBCAT: database of CpG islands and analytical tools for identifying comprehensive methylation profiles in cancer cells.DBCAT：CpG岛数据库及用于识别癌细胞中全面甲基化图谱的分析工具。

J Comput Biol. 2011 Aug;18(8):1013-7. doi: 10.1089/cmb.2010.0038. Epub 2011 Jan 8.

Automatic discourse connective detection in biomedical text.生物医学文本中的自动语篇连接词检测。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):800-8. doi: 10.1136/amiajnl-2011-000775. Epub 2012 Jun 28.

Wide-coverage relation extraction from MEDLINE using deep syntax.使用深度句法从医学文献数据库（MEDLINE）中进行广泛覆盖的关系提取。

BMC Bioinformatics. 2015 Apr 1;16:107. doi: 10.1186/s12859-015-0538-8.

Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations.从Reddit挖掘文本健康信息：利用提取的实体及其关系分析慢性病

J Med Internet Res. 2019 Jun 13;21(6):e12876. doi: 10.2196/12876.

引用本文的文献

Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease.Lnc2Meth：一个手动整理的数据库，包含与人类疾病相关的长非编码 RNA 和 DNA 甲基化之间的调控关系。

Nucleic Acids Res. 2018 Jan 4;46(D1):D133-D138. doi: 10.1093/nar/gkx985.

DiseaseMeth version 2.0: a major expansion and update of the human disease methylation database.疾病甲基化数据库2.0版：人类疾病甲基化数据库的重大扩展与更新

Nucleic Acids Res. 2017 Jan 4;45(D1):D888-D895. doi: 10.1093/nar/gkw1123. Epub 2016 Nov 29.

Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.精准医学的文本挖掘：为电子健康记录和生物医学文献构建结构以理解基因与健康。

Adv Exp Med Biol. 2016;939:139-166. doi: 10.1007/978-981-10-1503-8_7.

PALM-IST: Pathway Assembly from Literature Mining--an Information Search Tool.PALM-IST：基于文献挖掘的通路组装——一种信息搜索工具。

Sci Rep. 2015 May 19;5:10021. doi: 10.1038/srep10021.

DDMGD: the database of text-mined associations between genes methylated in diseases from different species.DDMGD：不同物种疾病中甲基化基因之间文本挖掘关联的数据库。

Nucleic Acids Res. 2015 Jan;43(Database issue):D879-86. doi: 10.1093/nar/gku1168. Epub 2014 Nov 14.

OncoSearch: cancer gene search engine with literature evidence.OncoSearch：具有文献证据的癌症基因搜索引擎。

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W416-21. doi: 10.1093/nar/gku368. Epub 2014 May 9.

CoMAGC: a corpus with multi-faceted annotations of gene-cancer relations.CoMAGC：一个具有基因-癌症关系多方面注释的语料库。

BMC Bioinformatics. 2013 Nov 14;14:323. doi: 10.1186/1471-2105-14-323.

EPITRANS: a database that integrates epigenome and transcriptome data.EPITRANS：一个整合了表观基因组和转录组数据的数据库。

Mol Cells. 2013 Nov;36(5):472-5. doi: 10.1007/s10059-013-0249-9. Epub 2013 Nov 8.

Combining position weight matrices and document-term matrix for efficient extraction of associations of methylated genes and diseases from free text.结合位置权重矩阵和文档-术语矩阵，从自由文本中高效提取甲基化基因与疾病的关联。

PLoS One. 2013 Oct 16;8(10):e77848. doi: 10.1371/journal.pone.0077848. eCollection 2013.

DigSee: Disease gene search engine with evidence sentences (version cancer).DigSee：带证据句的疾病基因搜索引擎（癌症版）。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W510-7. doi: 10.1093/nar/gkt531. Epub 2013 Jun 12.

本文引用的文献

Event extraction for DNA methylation.DNA甲基化的事件提取

J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S2. doi: 10.1186/2041-1480-2-S5-S2.

Multistage gene normalization and SVM-based ranking for protein interactor extraction in full-text articles.多阶段基因标准化和基于 SVM 的排序在全文文章中提取蛋白质互作。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):412-20. doi: 10.1109/TCBB.2010.45.

DNA methylation markers in colorectal cancer.结直肠癌中的 DNA 甲基化标志物。

Cancer Metastasis Rev. 2010 Mar;29(1):181-206. doi: 10.1007/s10555-010-9207-6.

Epigenetic inactivation of the miR-34a in hematological malignancies.血液恶性肿瘤中 miR-34a 的表观遗传失活。

Carcinogenesis. 2010 Apr;31(4):745-50. doi: 10.1093/carcin/bgq033. Epub 2010 Jan 29.

Breast cancer epigenetics: from DNA methylation to microRNAs.乳腺癌表观遗传学：从 DNA 甲基化到 microRNAs。

J Mammary Gland Biol Neoplasia. 2010 Mar;15(1):5-17. doi: 10.1007/s10911-010-9165-1. Epub 2010 Jan 27.

Involvement of epigenetically silenced microRNA-181c in gastric carcinogenesis.表观遗传沉默的 microRNA-181c 参与胃癌发生。

Carcinogenesis. 2010 May;31(5):777-84. doi: 10.1093/carcin/bgq013. Epub 2010 Jan 15.

HypertenGene: extracting key hypertension genes from biomedical literature with position and automatically-generated template features.HypertensionGene：从生物医学文献中提取关键高血压基因，使用位置和自动生成的模板特征。

BMC Bioinformatics. 2009 Dec 3;10 Suppl 15(Suppl 15):S9. doi: 10.1186/1471-2105-10-S15-S9.

Epigenetics and cancer treatment.表观遗传学与癌症治疗。

Eur J Pharmacol. 2009 Dec 25;625(1-3):131-42. doi: 10.1016/j.ejphar.2009.10.011. Epub 2009 Oct 18.

DNA methylation: an introduction to the biology and the disease-associated changes of a promising biomarker.DNA甲基化：一种有前景的生物标志物的生物学及疾病相关变化介绍

Methods Mol Biol. 2009;507:3-20. doi: 10.1007/978-1-59745-522-0_1.

Epigenetics in cancer.癌症中的表观遗传学

N Engl J Med. 2008 Mar 13;358(11):1148-59. doi: 10.1056/NEJMra072067.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MeInfoText 2.0：从生物医学文献中提取基因甲基化与癌症的关系。

MeInfoText 2.0: gene methylation and cancer relation extraction from biomedical literature.

机构信息

出版信息

BACKGROUND

DESCRIPTION

CONCLUSION

背景

描述

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献