Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757 Sankt Augustin, Germany.
Knowledge Management, German National Library of Medicine (ZB MED)-Information Centre for Life Sciences, Friedrich-Hirzebruch-Allee 4, Bonn 53115, Germany.
Database (Oxford). 2024 Aug 5;2024. doi: 10.1093/database/baae066.
MicroRNAs (miRNAs) play important roles in post-transcriptional processes and regulate major cellular functions. The abnormal regulation of expression of miRNAs has been linked to numerous human diseases such as respiratory diseases, cancer, and neurodegenerative diseases. Latest miRNA-disease associations are predominantly found in unstructured biomedical literature. Retrieving these associations manually can be cumbersome and time-consuming due to the continuously expanding number of publications. We propose a deep learning-based text mining approach that extracts normalized miRNA-disease associations from biomedical literature. To train the deep learning models, we build a new training corpus that is extended by distant supervision utilizing multiple external databases. A quantitative evaluation shows that the workflow achieves an area under receiver operator characteristic curve of 98% on a holdout test set for the detection of miRNA-disease associations. We demonstrate the applicability of the approach by extracting new miRNA-disease associations from biomedical literature (PubMed and PubMed Central). We have shown through quantitative analysis and evaluation on three different neurodegenerative diseases that our approach can effectively extract miRNA-disease associations not yet available in public databases. Database URL: https://zenodo.org/records/10523046.
微小 RNA(miRNAs)在后转录过程中发挥重要作用,并调节主要的细胞功能。miRNAs 表达的异常调节与许多人类疾病有关,如呼吸道疾病、癌症和神经退行性疾病。最新的 miRNA-疾病关联主要存在于无结构的生物医学文献中。由于出版物数量的不断增加,手动检索这些关联可能既繁琐又耗时。我们提出了一种基于深度学习的文本挖掘方法,可从生物医学文献中提取标准化的 miRNA-疾病关联。为了训练深度学习模型,我们构建了一个新的训练语料库,该语料库通过利用多个外部数据库进行远程监督来扩展。定量评估表明,该工作流程在保留测试集上检测 miRNA-疾病关联的接收者操作特征曲线下面积达到 98%。我们通过从生物医学文献(PubMed 和 PubMed Central)中提取新的 miRNA-疾病关联来证明该方法的适用性。通过对三种不同的神经退行性疾病进行定量分析和评估,我们表明我们的方法可以有效地提取尚未在公共数据库中提供的 miRNA-疾病关联。数据库 URL:https://zenodo.org/records/10523046。