一种用于利用正未标记数据预测潜在miRNA与疾病关联的集成学习框架。

An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data.

作者信息

Wu Yao, Zhu Donghua, Wang Xuefeng, Zhang Shuo

机构信息

School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China.

出版信息

Comput Biol Chem. 2021 Dec;95:107566. doi: 10.1016/j.compbiolchem.2021.107566. Epub 2021 Aug 24.

DOI:10.1016/j.compbiolchem.2021.107566

PMID:34534906

Abstract

To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated uncorrelated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework's efficacy at identifying miRNA disease associations.

摘要

为了探究微小RNA（miRNA）在多种疾病中的致病机制，许多研究人员致力于利用机器学习方法发现miRNA与疾病之间的潜在关联。然而，监督式机器学习方法的预测准确性受到缺乏经实验验证的不相关miRNA-疾病对的限制。没有这些阴性样本，训练一个高度准确的模型要困难得多。与使用随机选择的未知样本作为阴性训练样本的传统miRNA-疾病预测模型不同，我们提出了一个集成学习框架来解决这个正例未标注（PU）学习问题。该框架包含两个步骤，即一种新颖的半监督Kmeans（SS-Kmeans）方法，用于从未知的miRNA-疾病对中提取可靠的阴性样本，以及子agging方法，用于生成多样化的训练样本集，以便充分利用这些可靠的阴性样本进行集成学习。结合有效的随机向量函数链接（RVFL）网络作为预测模型，所提出的框架与其他流行方法相比，显示出更高的预测准确性。一项关于肺癌和胃癌的案例研究进一步证实了该框架在识别miRNA-疾病关联方面的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于利用正未标记数据预测潜在miRNA与疾病关联的集成学习框架。

An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

一种用于利用正未标记数据预测潜在miRNA与疾病关联的集成学习框架。

An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献