Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa058.
Accumulated researches have revealed that Piwi-interacting RNAs (piRNAs) are regulating the development of germ and stem cells, and they are closely associated with the progression of many diseases. As the number of the detected piRNAs is increasing rapidly, it is important to computationally identify new piRNA-disease associations with low cost and provide candidate piRNA targets for disease treatment. However, it is a challenging problem to learn effective association patterns from the positive piRNA-disease associations and the large amount of unknown piRNA-disease pairs. In this study, we proposed a computational predictor called iPiDi-PUL to identify the piRNA-disease associations. iPiDi-PUL extracted the features of piRNA-disease associations from three biological data sources, including piRNA sequence information, disease semantic terms and the available piRNA-disease association network. Principal component analysis (PCA) was then performed on these features to extract the key features. The training datasets were constructed based on known positive associations and the negative associations selected from the unknown pairs. Various random forest classifiers trained with these different training sets were merged to give the predictive results via an ensemble learning approach. Finally, the web server of iPiDi-PUL was established at http://bliulab.net/iPiDi-PUL to help the researchers to explore the associated diseases for newly discovered piRNAs.
积累的研究表明,Piwi 相互作用 RNA(piRNA)在调节生殖细胞和干细胞的发育方面发挥着重要作用,它们与许多疾病的进展密切相关。随着检测到的 piRNA 数量的快速增加,用低成本计算识别新的 piRNA-疾病关联,并为疾病治疗提供候选 piRNA 靶点非常重要。然而,从阳性 piRNA-疾病关联和大量未知 piRNA-疾病对中学习有效的关联模式是一个具有挑战性的问题。在这项研究中,我们提出了一种名为 iPiDi-PUL 的计算预测器来识别 piRNA-疾病关联。iPiDi-PUL 从三个生物数据源中提取 piRNA-疾病关联的特征,包括 piRNA 序列信息、疾病语义术语和现有的 piRNA-疾病关联网络。然后对这些特征进行主成分分析(PCA),以提取关键特征。训练数据集是基于已知的阳性关联和从未知对中选择的阴性关联构建的。使用这些不同的训练集训练各种随机森林分类器,并通过集成学习方法合并这些分类器的预测结果。最后,iPiDi-PUL 的网络服务器建立在 http://bliulab.net/iPiDi-PUL 上,以帮助研究人员探索新发现的 piRNA 相关疾病。