基于统计学原理的方法，用于识别和规范科学文献中描述的 microRNAs。

Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature.

机构信息

Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan, ROC.

Big Data Laboratories, Chunghwa Telecom Co., Taoyuan, Taiwan, ROC.

出版信息

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz030.

DOI:10.1093/database/baz030

PMID:30809637

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6391575/

Abstract

The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or genetic regulatory information in the form of figure captions, the extraction of miRNA from both the main content and figure captions of a manuscript is useful in aggregate analysis and comparative analysis of the studies published. In this study, we present a statistical principle-based miRNA recognition and normalization method to identify miRNAs and link them to the identifiers in the Rfam database. As one of the core components in the text mining pipeline of the database miRTarBase, the proposed method combined the advantages of previous works relying on pattern, dictionary and supervised learning and provided an integrated solution for the problem of miRNA identification. Furthermore, the knowledge learned from the training data was organized in a human-interpretable manner to understand the reason why the system considers a span of text as a miRNA mention, and the represented knowledge can be further complemented by domain experts. We studied the ambiguity level of miRNA nomenclature to connect the miRNA mentions to the Rfam database and evaluated the performance of our approach on two datasets: the BioCreative VI Bio-ID corpus and the miRNA interaction corpus by extending the later corpus with additional Rfam normalization information. Our study highlights and also proposes a better understanding of the challenges associated with miRNA identification and normalization in scientific literature and the research gap that needs to be further explored in prospective studies.

摘要

在科学文献中检测 MicroRNA（miRNA）提及，使研究人员能够根据使用 miRNA 信息制定的查询，找到相关和适当的文献。考虑到大多数已发表的生物学研究都是以图表标题的形式阐述信号转导途径或遗传调控信息，因此从手稿的主要内容和图表标题中提取 miRNA 对于汇总分析和比较已发表的研究是有用的。在这项研究中，我们提出了一种基于统计原理的 miRNA 识别和标准化方法，用于识别 miRNA 并将其与 Rfam 数据库中的标识符联系起来。作为数据库 miRTarBase 的文本挖掘管道的核心组件之一，该方法结合了基于模式、字典和监督学习的先前工作的优势，为 miRNA 识别问题提供了一种综合解决方案。此外，从训练数据中学习到的知识以人类可理解的方式进行组织，以了解系统为什么认为一段文本是 miRNA 提及，并且代表的知识可以由领域专家进一步补充。我们研究了 miRNA 命名法的歧义程度，以将 miRNA 提及与 Rfam 数据库联系起来，并在两个数据集上评估了我们的方法的性能：BioCreative VI Bio-ID 语料库和通过向后者语料库扩展额外的 Rfam 标准化信息而扩展的 miRNA 相互作用语料库。我们的研究强调并提出了对 miRNA 在科学文献中的识别和标准化相关挑战以及需要在未来研究中进一步探索的研究差距的更好理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bd7/6391575/a630ff12cda8/baz030f1.jpg

相似文献

Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature.

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz030.

miRSel: automated extraction of associations between microRNAs and genes from the biomedical literature.

BMC Bioinformatics. 2010 Mar 16;11:135. doi: 10.1186/1471-2105-11-135.

Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S1. doi: 10.1186/gb-2008-9-s2-s1. Epub 2008 Sep 1.

Detecting miRNA Mentions and Relations in Biomedical Literature.

F1000Res. 2014 Aug 28;3:205. doi: 10.12688/f1000research.4591.3. eCollection 2014.

SPRENO: a BioC module for identifying organism terms in figure captions.

Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay048.

ProNormz--an integrated approach for human proteins and protein kinases normalization.

J Biomed Inform. 2014 Feb;47:131-8. doi: 10.1016/j.jbi.2013.10.003. Epub 2013 Oct 19.

Collective instance-level gene normalization on the IGN corpus.

PLoS One. 2013 Nov 25;8(11):e79517. doi: 10.1371/journal.pone.0079517. eCollection 2013.

Terminological resources for text mining over biomedical scientific literature.

Artif Intell Med. 2011 Jun;52(2):107-14. doi: 10.1016/j.artmed.2011.04.011. Epub 2011 Jun 11.

Computational Resources for Prediction and Analysis of Functional miRNA and Their Targetome.

Methods Mol Biol. 2019;1912:215-250. doi: 10.1007/978-1-4939-8982-9_9.

microRNAs Databases: Developmental Methodologies, Structural and Functional Annotations.

Interdiscip Sci. 2017 Sep;9(3):357-377. doi: 10.1007/s12539-016-0166-7. Epub 2016 Mar 28.

引用本文的文献

Posterior cingulate cortex reveals an expression profile of resilience in cognitively intact elders.

Brain Commun. 2022 Jun 21;4(4):fcac162. doi: 10.1093/braincomms/fcac162. eCollection 2022.

本文引用的文献

Improving biocuration of microRNAs in diseases: a case study in idiopathic pulmonary fibrosis.

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax030.

Extracting microRNA-gene relations from biomedical literature using distant supervision.

PLoS One. 2017 Mar 6;12(3):e0171929. doi: 10.1371/journal.pone.0171929. eCollection 2017.

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion.

Database (Oxford). 2016 Aug 7;2016. doi: 10.1093/database/baw112. Print 2016.

MET network in PubMed: a text-mined network visualization and curation system.

Database (Oxford). 2016 May 30;2016. doi: 10.1093/database/baw090. Print 2016.

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.

miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database.

Nucleic Acids Res. 2016 Jan 4;44(D1):D239-47. doi: 10.1093/nar/gkv1258. Epub 2015 Nov 20.

Detecting miRNA Mentions and Relations in Biomedical Literature.

F1000Res. 2014 Aug 28;3:205. doi: 10.12688/f1000research.4591.3. eCollection 2014.

miRTex: A Text Mining System for miRNA-Gene Relation Extraction.

PLoS Comput Biol. 2015 Sep 25;11(9):e1004391. doi: 10.1371/journal.pcbi.1004391. eCollection 2015.

BioC: a minimalist approach to interoperability for biomedical text processing.

Database (Oxford). 2013 Sep 18;2013:bat064. doi: 10.1093/database/bat064. Print 2013.

Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding.

Cell. 2013 Apr 25;153(3):654-65. doi: 10.1016/j.cell.2013.03.043.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于统计学原理的方法，用于识别和规范科学文献中描述的 microRNAs。

Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature.

机构信息

Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan, ROC.

Big Data Laboratories, Chunghwa Telecom Co., Taoyuan, Taiwan, ROC.

出版信息

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz030.

DOI:10.1093/database/baz030

PMID:30809637

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6391575/

Abstract

摘要

基于统计学原理的方法，用于识别和规范科学文献中描述的 microRNAs。

Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于统计学原理的方法，用于识别和规范科学文献中描述的 microRNAs。

Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献