Wu Yao, Zhu Donghua, Wang Xuefeng, Zhang Shuo
School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China.
School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China.
Comput Biol Chem. 2021 Dec;95:107566. doi: 10.1016/j.compbiolchem.2021.107566. Epub 2021 Aug 24.
To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated uncorrelated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework's efficacy at identifying miRNA disease associations.
为了探究微小RNA(miRNA)在多种疾病中的致病机制,许多研究人员致力于利用机器学习方法发现miRNA与疾病之间的潜在关联。然而,监督式机器学习方法的预测准确性受到缺乏经实验验证的不相关miRNA-疾病对的限制。没有这些阴性样本,训练一个高度准确的模型要困难得多。与使用随机选择的未知样本作为阴性训练样本的传统miRNA-疾病预测模型不同,我们提出了一个集成学习框架来解决这个正例未标注(PU)学习问题。该框架包含两个步骤,即一种新颖的半监督Kmeans(SS-Kmeans)方法,用于从未知的miRNA-疾病对中提取可靠的阴性样本,以及子agging方法,用于生成多样化的训练样本集,以便充分利用这些可靠的阴性样本进行集成学习。结合有效的随机向量函数链接(RVFL)网络作为预测模型,所提出的框架与其他流行方法相比,显示出更高的预测准确性。一项关于肺癌和胃癌的案例研究进一步证实了该框架在识别miRNA-疾病关联方面的有效性。