Applied Linguistics Group, School of College English Teaching and Research, Henan University, Kaifeng 475000, China.
Molecules. 2022 Jul 23;27(15):4710. doi: 10.3390/molecules27154710.
Noncoding RNAs (ncRNA) are transcripts without protein-coding potential that play fundamental regulatory roles in diverse cellular processes and diseases. The application of deep sequencing experiments in ncRNA research have generated massive omics datasets, which require rapid examination, interpretation and validation based on exiting knowledge resources. Thus, text-mining methods have been increasingly adapted for automatic extraction of relations between an ncRNA and its target or a disease condition from biomedical literature. These bioinformatics tools can also assist in more complex research, such as database curation of candidate ncRNAs and hypothesis generation with respect to pathophysiological mechanisms. In this concise review, we first introduced basic concepts and workflow of literature mining systems. Then, we compared available bioinformatics tools tailored for ncRNA studies, including the tasks, applicability, and limitations. Their powerful utilities and flexibility are demonstrated by examples in a variety of diseases, such as Alzheimer's disease, atherosclerosis and cancers. Finally, we outlined several challenges from the viewpoints of both system developers and end users. We concluded that the application of text-mining techniques will booster disease-associated ncRNA discoveries in the biomedical literature and enable integrative biology in the current omics era.
非编码 RNA(ncRNA)是一类没有蛋白编码潜能的转录物,在多种细胞过程和疾病中发挥着基本的调控作用。深度测序实验在 ncRNA 研究中的应用产生了大量的组学数据集,这些数据集需要基于现有知识资源进行快速检查、解释和验证。因此,文本挖掘方法已越来越多地被用于从生物医学文献中自动提取 ncRNA 与其靶标或疾病状况之间的关系。这些生物信息学工具还可以辅助更复杂的研究,例如候选 ncRNA 的数据库整理和与病理生理机制相关的假说生成。在这篇简明的综述中,我们首先介绍了文献挖掘系统的基本概念和工作流程。然后,我们比较了针对 ncRNA 研究定制的现有生物信息学工具,包括任务、适用性和局限性。通过各种疾病(如阿尔茨海默病、动脉粥样硬化和癌症)的示例,展示了它们在不同场景下的强大功能和灵活性。最后,我们从系统开发人员和最终用户的角度概述了几个挑战。我们得出结论,文本挖掘技术的应用将促进生物医学文献中与疾病相关的 ncRNA 发现,并使当前组学时代的整合生物学成为可能。