College of Information and Communication Engineering, Harbin Engineering University, Harbin, 150001, China.
Ministry of Industry and Information Technology, Key Laboratory of Advanced Marine Communication and Information Technology, Harbin, 150001, China.
Sci Rep. 2022 Apr 6;12(1):5819. doi: 10.1038/s41598-022-09672-1.
Growing evidence shows that long noncoding RNAs (lncRNAs) play an important role in cellular biological processes at multiple levels, such as gene imprinting, immune response, and genetic regulation, and are closely related to diseases because of their complex and precise control. However, most functions of lncRNAs remain undiscovered. Current computational methods for exploring lncRNA functions can avoid high-throughput experiments, but they usually focus on the construction of similarity networks and ignore the certain directed acyclic graph (DAG) formed by gene ontology annotations. In this paper, we view the function annotation work as a hierarchical multilabel classification problem and design a method HLSTMBD for classification with DAG-structured labels. With the help of a mathematical model based on Bayesian decision theory, the HLSTMBD algorithm is implemented with the long-short term memory network and a hierarchical constraint method DAGLabel. Compared with other state-of-the-art algorithms, the results on GOA-lncRNA datasets show that the proposed method can efficiently and accurately complete the label prediction work.
越来越多的证据表明,长非编码 RNA(lncRNA)在多个层面上的细胞生物学过程中发挥着重要作用,例如基因印记、免疫反应和遗传调控,并且由于其复杂而精确的控制,与疾病密切相关。然而,大多数 lncRNA 的功能仍然未知。目前用于探索 lncRNA 功能的计算方法可以避免高通量实验,但它们通常侧重于构建相似性网络,而忽略了由基因本体论注释形成的特定有向无环图(DAG)。在本文中,我们将功能注释工作视为层次多标签分类问题,并设计了一种具有 DAG 结构标签的分类方法 HLSTMBD。借助基于贝叶斯决策理论的数学模型,HLSTMBD 算法通过长短期记忆网络和层次约束方法 DAGLabel 来实现。与其他最先进的算法相比,在 GOA-lncRNA 数据集上的结果表明,所提出的方法可以有效地、准确地完成标签预测工作。