College of Informatics, Huazhong Agricultural University.
Department of Computer Science & Engineering, The Ohio State University.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa140.
MicroRNAs (miRNAs) play crucial roles in multifarious biological processes associated with human diseases. Identifying potential miRNA-disease associations contributes to understanding the molecular mechanisms of miRNA-related diseases. Most of the existing computational methods mainly focus on predicting whether a miRNA-disease association exists or not. However, the roles of miRNAs in diseases are prominently diverged, for instance, Genetic variants of miRNA (mir-15) may affect the expression level of miRNAs leading to B cell chronic lymphocytic leukemia, while circulating miRNAs (including mir-1246, mir-1307-3p, etc.) have potentials to detecting breast cancer in the early stage. In this paper, we aim to predict multi-type miRNA-disease associations instead of taking them as binary. To this end, we innovatively represent miRNA-disease-type triples as a tensor and introduce tensor decomposition methods to solve the prediction task. Experimental results on two widely-adopted miRNA-disease datasets: HMDD v2.0 and HMDD v3.2 show that tensor decomposition methods improve a recent baseline in a large scale (up to $38%$ in Top-1F1). We then propose a novel method, Tensor Decomposition with Relational Constraints (TDRC), which incorporates biological features as relational constraints to further the existing tensor decomposition methods. Compared with two existing tensor decomposition methods, TDRC can produce better performance while being more efficient.
微小 RNA(miRNAs)在与人类疾病相关的多种生物学过程中发挥着关键作用。鉴定潜在的 miRNA-疾病关联有助于理解 miRNA 相关疾病的分子机制。大多数现有的计算方法主要集中于预测 miRNA-疾病关联是否存在。然而,miRNAs 在疾病中的作用明显不同,例如,miRNA(mir-15)的遗传变异可能会影响 miRNAs 的表达水平,导致 B 细胞慢性淋巴细胞白血病,而循环 miRNAs(包括 mir-1246、mir-1307-3p 等)有潜力在早期检测乳腺癌。在本文中,我们旨在预测多种类型的 miRNA-疾病关联,而不是将它们视为二进制。为此,我们创新性地将 miRNA-疾病-类型三元素组表示为张量,并引入张量分解方法来解决预测任务。在两个广泛采用的 miRNA-疾病数据集 HMDD v2.0 和 HMDD v3.2 上的实验结果表明,张量分解方法在大规模上提高了最近的基线(高达 38%的 Top-1F1)。然后,我们提出了一种新的方法,即具有关系约束的张量分解(TDRC),它将生物学特征作为关系约束纳入到现有的张量分解方法中。与两种现有的张量分解方法相比,TDRC 可以在提高性能的同时更高效。