Bioentity2vec：一种用于预测生物实体之间多类型关系的属性和行为驱动的表示方法。

Bioentity2vec: Attribute- and behavior-driven representation for predicting multi-type relationships between bioentities.

机构信息

XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China.

University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa032.

DOI:10.1093/gigascience/giaa032

PMID:32533701

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7293023/

Abstract

BACKGROUND

The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems.

RESULTS

We constructed a molecular association network, which contains 18 edges (relationships) between 8 nodes (bioentities). Based on this, we propose Bioentity2vec, a new method for representing bioentities, which integrates information about the attributes and behaviors of a bioentity. Applying the random forest classifier, we achieved promising performance on 18 relationships, with an area under the curve of 0.9608 and an area under the precision-recall curve of 0.9572.

CONCLUSIONS

Our study shows that constructing a network with rich topological and biological information is important for systematic understanding of the biological landscape at the molecular level. Our results show that Bioentity2vec can effectively represent biological entities and provides easily distinguishable information about classification tasks. Our method is also able to simultaneously predict relationships between single types and multiple types, which will accelerate progress in biological experimental research and industrial product development.

摘要

背景

基因组学、化学和病理学数据的爆炸式增长为人类彻底了解细胞中的生命活动提供了新的机遇和挑战。然而，目前还很少有计算模型能够综合各种生物实体，全面揭示生物系统的物理和功能景观。

结果

我们构建了一个分子关联网络，其中包含 8 个节点（生物实体）之间的 18 条边（关系）。在此基础上，我们提出了一种新的生物实体表示方法 Bioentity2vec，它整合了生物实体的属性和行为信息。我们应用随机森林分类器在 18 种关系上取得了有希望的性能，曲线下面积为 0.9608，精度-召回曲线下面积为 0.9572。