Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia.
Life Sciences Division, College of Science & Engineering, Hamad Bin Khalifa University, HBKU, Doha, Qatar.
Bioinformatics. 2017 Sep 1;33(17):2723-2730. doi: 10.1093/bioinformatics/btx275.
Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics.
https://github.com/bio-ontology-research-group/walking-rdf-and-owl.
robert.hoehndorf@kaust.edu.sa.
Supplementary data are available at Bioinformatics online.
生物数据和知识库越来越依赖语义网技术和知识图谱的使用来进行数据集成、检索和联邦查询。在过去的几年中,适用于图结构数据的特征学习方法已经出现,但尚未广泛应用于结构化生物知识并进行评估。结果:我们开发了一种针对生物知识图谱的特征学习新方法。我们的方法结合了符号方法,特别是使用符号逻辑和自动化推理的知识表示,以及神经网络,以生成节点的嵌入,这些嵌入编码了知识图内的相关信息。通过使用符号逻辑,这些嵌入包含了显式和隐式信息。我们将这些嵌入应用于知识图中预测边的问题,例如功能预测、寻找疾病的候选基因、蛋白质-蛋白质相互作用或药物靶标关系,并展示了与基于手工制作特征的传统方法相匹配且有时超越的性能。我们的方法可以应用于任何生物知识图谱,从而将越来越多的基于语义网的生物学知识库应用于机器学习和数据分析。
https://github.com/bio-ontology-research-group/walking-rdf-and-owl。
robert.hoehndorf@kaust.edu.sa。
补充数据可在生物信息学在线获得。