Sikirzhytskaya Aliaksandra, Tyagin Ilya, Sutton S Scott, Wyatt Michael D, Safro Ilya, Shtutman Michael
Res Sq. 2024 Aug 17:rs.3.rs-4750719. doi: 10.21203/rs.3.rs-4750719/v1.
Neurodegenerative pathologies such as Alzheimer's disease, Parkinson's disease, Huntington's disease, Amyotrophic lateral sclerosis, Multiple sclerosis, HIV-associated neurocognitive disorder, and others significantly affect individuals, their families, caregivers, and healthcare systems. While there are no cures yet, researchers worldwide are actively working on the development of novel treatments that have the potential to slow disease progression, alleviate symptoms, and ultimately improve the overall health of patients. Huge volumes of new scientific information necessitate new analytical approaches for meaningful hypothesis generation. To enable the automatic analysis of biomedical data we introduced AGATHA, an effective AI-based literature mining tool that can navigate massive scientific literature databases, such as PubMed. The overarching goal of this effort is to adapt AGATHA for drug repurposing by revealing hidden connections between FDA-approved medications and a health condition of interest. Our tool converts the abstracts of peer-reviewed papers from PubMed into multidimensional space where each gene and health condition are represented by specific metrics. We implemented advanced statistical analysis to reveal distinct clusters of scientific terms within the virtual space created using AGATHA-calculated parameters for selected health conditions and genes. Partial Least Squares Discriminant Analysis was employed for categorizing and predicting samples (122 diseases and 20889 genes) fitted to specific classes. Advanced statistics were employed to build a discrimination model and extract lists of genes specific to each disease class. Here we focus on drugs that can be repurposed for dementia treatment as an outcome of neurodegenerative diseases. Therefore, we determined dementia-associated genes statistically highly ranked in other disease classes. Additionally, we report a mechanism for detecting genes common to multiple health conditions. These sets of genes were classified based on their presence in biological pathways, aiding in selecting candidates and biological processes that are exploitable with drug repurposing.
神经退行性疾病,如阿尔茨海默病、帕金森病、亨廷顿病、肌萎缩侧索硬化症、多发性硬化症、HIV相关神经认知障碍等,对个人、其家庭、护理人员和医疗系统都有显著影响。虽然目前尚无治愈方法,但全球的研究人员都在积极致力于开发新的治疗方法,这些方法有可能减缓疾病进展、缓解症状,并最终改善患者的整体健康状况。大量的新科学信息需要新的分析方法来生成有意义的假设。为了实现生物医学数据的自动分析,我们引入了AGATHA,这是一种基于人工智能的有效文献挖掘工具,它可以浏览海量的科学文献数据库,如PubMed。这项工作的总体目标是通过揭示FDA批准的药物与感兴趣的健康状况之间的隐藏联系,使AGATHA适用于药物再利用。我们的工具将PubMed中同行评审论文的摘要转换为多维空间,其中每个基因和健康状况都由特定指标表示。我们实施了先进的统计分析,以揭示在使用AGATHA计算的选定健康状况和基因参数创建的虚拟空间内不同的科学术语簇。偏最小二乘判别分析用于对适合特定类别的样本(122种疾病和20889个基因)进行分类和预测。采用先进的统计方法建立判别模型,并提取每个疾病类别的特定基因列表。在这里,我们关注作为神经退行性疾病结果可用于痴呆症治疗的药物。因此,我们确定了在其他疾病类别中统计排名靠前的与痴呆症相关的基因。此外,我们报告了一种检测多种健康状况共有的基因的机制。这些基因集根据它们在生物途径中的存在进行分类,有助于选择可用于药物再利用的候选基因和生物过程。