Balderas-Martínez Yalbi Itzel, Rinaldi Fabio, Contreras Gabriela, Solano-Lira Hilda, Sánchez-Pérez Mishael, Collado-Vides Julio, Selman Moisés, Pardo Annie
Facultad de Ciencias, Departamento Biología Celular, Universidad Nacional Autónoma de México, Ciudad Universitaria, Circuito Exterior s/n, Coyoacán, CP 04510, Ciudad de México, CDMX, México.
CONACYT-INER Ismael Cosío Villegas, Departamento Investigación, Calzada de Tlalpan 4502 Sección XVI, Tlalpan, CP Ciudad de México, CDMX, México.
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax030.
MicroRNAs (miRNAs) are small and non-coding RNA molecules that inhibit gene expression posttranscriptionally. They play important roles in several biological processes, and in recent years there has been an interest in studying how they are related to the pathogenesis of diseases. Although there are already some databases that contain information for miRNAs and their relation with illnesses, their curation represents a significant challenge due to the amount of information that is being generated every day. In particular, respiratory diseases are poorly documented in databases, despite the fact that they are of increasing concern regarding morbidity, mortality and economic impacts. In this work, we present the results that we obtained in the BioCreative Interactive Track (IAT), using a semiautomatic approach for improving biocuration of miRNAs related to diseases. Our procedures will be useful to complement databases that contain this type of information. We adapted the OntoGene text mining pipeline and the ODIN curation system in a full-text corpus of scientific publications concerning one specific respiratory disease: idiopathic pulmonary fibrosis, the most common and aggressive of the idiopathic interstitial cases of pneumonia. We curated 823 miRNA text snippets and found a total of 246 miRNAs related to this disease based on our semiautomatic approach with the system OntoGene/ODIN. The biocuration throughput improved by a factor of 12 compared with traditional manual biocuration. A significant advantage of our semiautomatic pipeline is that it can be applied to obtain the miRNAs of all the respiratory diseases and offers the possibility to be used for other illnesses.
微小RNA(miRNA)是小的非编码RNA分子,可在转录后抑制基因表达。它们在多个生物学过程中发挥重要作用,近年来人们对研究它们与疾病发病机制的关系产生了兴趣。尽管已经有一些数据库包含miRNA及其与疾病关系的信息,但由于每天产生的信息量巨大,对这些信息进行管理是一项重大挑战。特别是,呼吸系统疾病在数据库中的记录很少,尽管它们在发病率、死亡率和经济影响方面日益受到关注。在这项工作中,我们展示了在生物创意交互式赛道(IAT)中获得的结果,使用半自动方法来改进与疾病相关的miRNA的生物信息管理。我们的程序将有助于补充包含此类信息的数据库。我们在关于一种特定呼吸系统疾病——特发性肺纤维化(最常见且侵袭性最强的特发性间质性肺炎病例)的科学出版物全文语料库中,对OntoGene文本挖掘管道和ODIN管理系统进行了调整。基于我们使用OntoGene/ODIN系统的半自动方法,我们整理了823个miRNA文本片段,共发现246个与该疾病相关的miRNA。与传统的手动生物信息管理相比,生物信息管理通量提高了12倍。我们半自动管道的一个显著优势是它可应用于获取所有呼吸系统疾病的miRNA,并为用于其他疾病提供了可能性。