Muniyappan Saranya, Rayan Arockia Xavier Annie, Varrieth Geetha Thekkumpurath
Computer Science and Engineering, CEG Campus, Anna University, Chennai, Tamil Nadu, India.
Computer Science and Engineering, CEG Campus, Anna University, Chennai, Tamil Nadu, India.
J Biomed Inform. 2023 Nov;147:104528. doi: 10.1016/j.jbi.2023.104528. Epub 2023 Oct 18.
Drug repurposing (DR) is an imminent approach for identifying novel therapeutic indications for the available drugs and discovering novel drugs for previously untreatable diseases. Nowadays, DR has major attention in the pharmaceutical industry due to the high cost and time of launching new drugs to the market through traditional drug development. DR task majorly depends on genetic information since the drugs revert the modified Gene Expression (GE) of diseases to normal. Many of the existing studies have not considered the genetic importance of predicting the potential candidates.
We proposed a novel multimodal framework that utilizes genetic aspects of drugs and diseases such as genes, pathways, gene signatures, or expression to enhance the performance of DR using various data sources. Firstly, the heterogeneous biological network (HBN) is constructed with three types of nodes namely drug, disease, and gene, and 4 types of edges similarities (drug, gene, and disease), drug-gene, gene-disease, and drug-disease. Next, a modified graph auto-encoder (GAE*) model is applied to learn the representation of drug and disease nodes using the topological structure and edge information. Secondly, the HBN is enhanced with the information extracted from biomedical literature and ontology using a novel semi-supervised pattern embedding-based bootstrapping model and novel DR perspective representation learning respectively to improve the prediction performance. Finally, our proposed system uses a neural network model to generate the probability score of drug-disease pairs.
We demonstrate the efficiency of the proposed model on various datasets and achieved outstanding performance in 5-fold cross-validation (AUC = 0.99, AUPR = 0.98). Further, we validated the top-ranked potential candidates using pathway analysis and proved that the known and predicted candidates share common genes in the pathways.
药物重新利用(DR)是一种紧迫的方法,用于确定现有药物的新治疗适应症,并为以前无法治疗的疾病发现新药。如今,由于通过传统药物开发将新药推向市场的成本高且耗时,DR在制药行业备受关注。DR任务主要依赖于遗传信息,因为药物可将疾病的基因表达(GE)修饰恢复正常。许多现有研究并未考虑预测潜在候选药物的遗传重要性。
我们提出了一种新颖的多模态框架,该框架利用药物和疾病的遗传方面,如基因、通路、基因特征或表达,通过各种数据源来提高DR的性能。首先,构建异质生物网络(HBN),它有三种类型的节点,即药物、疾病和基因,以及4种类型的边相似性(药物、基因和疾病)、药物-基因、基因-疾病和药物-疾病。接下来,应用改进的图自动编码器(GAE*)模型,利用拓扑结构和边信息学习药物和疾病节点的表示。其次,分别使用基于新颖的半监督模式嵌入的自训练模型和新颖的DR视角表示学习,从生物医学文献和本体中提取的信息增强HBN,以提高预测性能。最后,我们提出的系统使用神经网络模型生成药物-疾病对的概率分数。
我们在各种数据集上证明了所提出模型的效率,并在5折交叉验证中取得了优异的性能(AUC = 0.99,AUPR = 0.98)。此外,我们使用通路分析验证了排名靠前的潜在候选药物,并证明已知和预测的候选药物在通路中共享共同基因。