College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
Database (Oxford). 2018 Jan 1;2018:bay097. doi: 10.1093/database/bay097.
The precision medicine (PM) initiative promises to identify individualized treatment depending on a patients' genetic profile and their related responses. In order to help health professionals and researchers in the PM endeavor, BioCreative VI organized a PM Track to mine protein-protein interactions (PPI) affected by genetic mutations from the biomedical literature. In this paper, we present a neural network ensemble approach to identify relevant articles describing PPI affected by mutations. In this approach, several neural network models are used for document triage, and the ensemble performs better than any individual model. In the official runs, our best submission achieves an F-score of 69.04% in the BioCreative VI PM document triage task. After post-challenge analysis, to address the problem of the limited size of training set, a PPI pre-trained module is incorporated into our approach to further improve the performance. Finally, our best ensemble method achieves an F-score of 71.04% on the test set.
精准医学(PM)计划承诺根据患者的基因谱及其相关反应来确定个性化治疗方案。为了帮助 PM 领域的健康专业人员和研究人员,BioCreative VI 组织了一个 PM 跟踪小组,从生物医学文献中挖掘受基因突变影响的蛋白质-蛋白质相互作用(PPI)。在本文中,我们提出了一种神经网络集成方法来识别描述受突变影响的 PPI 的相关文章。在这种方法中,使用了几个神经网络模型进行文档分类,集成后的性能优于任何单个模型。在官方运行中,我们提交的最佳结果在 BioCreative VI PM 文档分类任务中的 F 分数达到了 69.04%。在赛后分析中,为了解决训练集规模有限的问题,我们将 PPI 预训练模块纳入我们的方法中,以进一步提高性能。最后,我们的最佳集成方法在测试集上的 F 分数达到了 71.04%。