CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai, 200031, China.
Mol Divers. 2010 Nov;14(4):719-29. doi: 10.1007/s11030-009-9216-y. Epub 2009 Dec 30.
We used a machine learning method, the nearest neighbor algorithm (NNA), to learn the relationship between miRNAs and their target proteins, generating a predictor which can then judge whether a new miRNA-target pair is true or not. We acquired 198 positive (true) miRNA-target pairs from Tarbase and the literature, and generated 4,888 negative (false) pairs through random combination. A 0/1 system and the frequencies of single nucleotides and di-nucleotides were used to encode miRNAs into vectors while various physicochemical parameters were used to encode the targets. The NNA was then applied, learning from these data to produce a predictor. We implemented minimum redundancy maximum relevance (mRMR) and properties forward selection (PFS) to reduce the redundancy of our encoding system, obtaining 91 most efficient properties. Finally, via the Jackknife cross-validation test, we got a positive accuracy of 69.2% and an overall accuracy of 96.0% with all the 253 properties. Besides, we got a positive accuracy of 83.8% and an overall accuracy of 97.2% with the 91 most efficient properties. A web-server for predictions is also made available at http://app3.biosino.org:8080/miRTP/index.jsp.
我们使用机器学习方法,即最近邻算法(NNA),学习 miRNA 与其靶蛋白之间的关系,生成一个预测器,然后可以判断新的 miRNA-靶对是否为真。我们从 Tarbase 和文献中获得了 198 对阳性(真)miRNA-靶对,并通过随机组合生成了 4888 对阴性(假)对。我们使用 0/1 系统和单核苷酸和双核苷酸的频率将 miRNA 编码成向量,而各种物理化学参数则用于编码靶标。然后应用 NNA,从这些数据中学习生成预测器。我们实施了最小冗余最大相关性(mRMR)和属性前向选择(PFS)来减少我们的编码系统的冗余性,获得了 91 个最有效的属性。最后,通过 Jackknife 交叉验证测试,我们在使用所有 253 个属性时得到了 69.2%的阳性准确率和 96.0%的整体准确率。此外,在使用 91 个最有效的属性时,我们得到了 83.8%的阳性准确率和 97.2%的整体准确率。一个预测的网络服务器也可在 http://app3.biosino.org:8080/miRTP/index.jsp 上使用。