Lee Po-Feng, Soo Von-Wun
Annu Int Conf IEEE Eng Med Biol Soc. 2013;2013:3507-10. doi: 10.1109/EMBC.2013.6610298.
Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.
已经开发了几种不同的计算方法来解决基因优先级排序问题。我们打算使用集成提升学习技术来组合用于基因优先级排序的各种计算方法,以提高整体性能。具体而言,我们根据以下两点为Rankboost算法添加了一个启发式加权函数:1)所采用方法针对某个基因生成的绝对排名,以及2)每个优先级排序结果中所有基因对之间的排名关系。我们选择OMIM数据库中的13个已知前列腺癌基因作为训练集,HGNC数据库中的蛋白质编码基因数据作为测试集。我们对集成排名提升学习采用留一法策略。实验结果表明,在13个已知基因的排名结果方面,我们的集成学习方法在平均精度、ROC和AUC度量上优于ToppGene套件中的四种基因优先级排序方法。