Sikirzhytskaya Aliaksandra, Tyagin Ilya, Sutton S Scott, Wyatt Michael D, Safro Ilya, Shtutman Michael
bioRxiv. 2024 Jun 9:2024.06.06.597745. doi: 10.1101/2024.06.06.597745.
Neurodegenerative pathologies such as Alzheimer's disease, Parkinson's disease, Huntington's disease, Amyotrophic lateral sclerosis, Multiple sclerosis, HIV-associated neurocognitive disorder, and others significantly affect individuals, their families, caregivers, and healthcare systems. While there are no cures yet, researchers worldwide are actively working on the development of novel treatments that have the potential to slow disease progression, alleviate symptoms, and ultimately improve the overall health of patients. Huge volumes of new scientific information necessitate new analytical approaches for meaningful hypothesis generation. To enable the automatic analysis of biomedical data we introduced AGATHA, an effective AI-based literature mining tool that can navigate massive scientific literature databases, such as PubMed. The overarching goal of this effort is to adapt AGATHA for drug repurposing by revealing hidden connections between FDA-approved medications and a health condition of interest. Our tool converts the abstracts of peer-reviewed papers from PubMed into multidimensional space where each gene and health condition are represented by specific metrics. We implemented advanced statistical analysis to reveal distinct clusters of scientific terms within the virtual space created using AGATHA-calculated parameters for selected health conditions and genes. Partial Least Squares Discriminant Analysis was employed for categorizing and predicting samples (122 diseases and 20889 genes) fitted to specific classes. Advanced statistics were employed to build a discrimination model and extract lists of genes specific to each disease class. Here we focus on drugs that can be repurposed for dementia treatment as an outcome of neurodegenerative diseases. Therefore, we determined dementia-associated genes statistically highly ranked in other disease classes. Additionally, we report a mechanism for detecting genes common to multiple health conditions. These sets of genes were classified based on their presence in biological pathways, aiding in selecting candidates and biological processes that are exploitable with drug repurposing.
This manuscript outlines our project involving the application of AGATHA, an AI-based literature mining tool, to discover drugs with the potential for repurposing in the context of neurocognitive disorders. The primary objective is to identify connections between approved medications and specific health conditions through advanced statistical analysis, including techniques like Partial Least Squares Discriminant Analysis (PLSDA) and unsupervised clustering. The methodology involves grouping scientific terms related to different health conditions and genes, followed by building discrimination models to extract lists of disease-specific genes. These genes are then analyzed through pathway analysis to select candidates for drug repurposing.
诸如阿尔茨海默病、帕金森病、亨廷顿病、肌萎缩侧索硬化症、多发性硬化症、与HIV相关的神经认知障碍等神经退行性疾病,对患者本人、其家庭、护理人员以及医疗系统都产生了重大影响。虽然目前尚无治愈方法,但全球的研究人员都在积极致力于开发新的治疗方法,这些方法有可能减缓疾病进展、缓解症状,并最终改善患者的整体健康状况。大量的新科学信息需要新的分析方法来生成有意义的假设。为了实现生物医学数据的自动分析,我们引入了AGATHA,这是一种基于人工智能的有效文献挖掘工具,它可以浏览诸如PubMed等海量科学文献数据库。这项工作的总体目标是通过揭示FDA批准的药物与感兴趣的健康状况之间的隐藏联系,使AGATHA适用于药物再利用。我们的工具将来自PubMed的同行评审论文摘要转换为多维空间,其中每个基因和健康状况都由特定指标表示。我们实施了先进的统计分析,以揭示在使用AGATHA计算的选定健康状况和基因参数创建的虚拟空间内不同的科学术语簇。采用偏最小二乘判别分析对适合特定类别的样本(122种疾病和20889个基因)进行分类和预测。运用先进的统计学方法建立判别模型,并提取每个疾病类别的特定基因列表。在这里,我们关注那些可作为神经退行性疾病结果用于痴呆症治疗的药物。因此,我们确定了在其他疾病类别中统计排名靠前的与痴呆症相关的基因。此外,我们报告了一种检测多种健康状况共有的基因的机制。这些基因集根据它们在生物途径中的存在情况进行分类,有助于选择可用于药物再利用的候选基因和生物过程。
本手稿概述了我们的项目,该项目涉及应用基于人工智能的文献挖掘工具AGATHA,以发现具有在神经认知障碍背景下进行再利用潜力的药物。主要目标是通过先进的统计分析,包括偏最小二乘判别分析(PLSDA)和无监督聚类等技术,确定已批准药物与特定健康状况之间的联系。该方法包括将与不同健康状况和基因相关的科学术语分组,然后建立判别模型以提取疾病特异性基因列表。然后通过通路分析对这些基因进行分析,以选择药物再利用的候选基因。