IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1550-1560. doi: 10.1109/TCBB.2017.2684127. Epub 2017 Mar 17.
Automated protein function prediction is a challenging problem with distinctive features, such as the hierarchical organization of protein functions and the scarcity of annotated proteins for most biological functions. We propose a multitask learning algorithm addressing both issues. Unlike standard multitask algorithms, which use task (protein functions) similarity information as a bias to speed up learning, we show that dissimilarity information enforces separation of rare class labels from frequent class labels, and for this reason is better suited for solving unbalanced protein function prediction problems. We support our claim by showing that a multitask extension of the label propagation algorithm empirically works best when the task relatedness information is represented using a dissimilarity matrix as opposed to a similarity matrix. Moreover, the experimental comparison carried out on three model organism shows that our method has a more stable performance in both "protein-centric" and "function-centric" evaluation settings.
自动蛋白质功能预测是一个具有独特特征的挑战性问题,例如蛋白质功能的层次结构组织和大多数生物学功能缺乏注释的蛋白质。我们提出了一种多任务学习算法来解决这两个问题。与使用任务(蛋白质功能)相似性信息作为偏向来加速学习的标准多任务算法不同,我们表明,相异性信息强制将稀有类标签与常见类标签分开,因此更适合解决不平衡的蛋白质功能预测问题。我们通过实验表明,在使用相似性矩阵表示任务相关性信息时,标签传播算法的多任务扩展效果最好,从而支持了我们的观点。此外,在三个模型生物上进行的实验比较表明,我们的方法在“以蛋白质为中心”和“以功能为中心”的评估设置中都具有更稳定的性能。