Information Technology Engineering, Al-Quds University, Abu Dis, Palestine.
The Wistar Institute, Philadelphia, PA, 19104, USA.
Sci Rep. 2022 Nov 19;12(1):19955. doi: 10.1038/s41598-022-24421-0.
The most common approaches to discovering genes associated with specific diseases are based on machine learning and use a variety of feature selection techniques to identify significant genes that can serve as biomarkers for a given disease. More recently, the integration in this process of prior knowledge-based approaches has shown significant promise in the discovery of new biomarkers with potential translational applications. In this study, we developed a novel approach, GediNET, that integrates prior biological knowledge to gene Groups that are shown to be associated with a specific disease such as a cancer. The novelty of GediNET is that it then also allows the discovery of significant associations between that specific disease and other diseases. The initial step in this process involves the identification of gene Groups. The Groups are then subjected to a Scoring component to identify the top performing classification Groups. The top-ranked gene Groups are then used to train a Machine Learning Model. The process of Grouping, Scoring and Modelling (G-S-M) is used by GediNET to identify other diseases that are similarly associated with this signature. GediNET identifies these relationships through Disease-Disease Association (DDA) based machine learning. DDA explores novel associations between diseases and identifies relationships which could be used to further improve approaches to diagnosis, prognosis, and treatment. The GediNET KNIME workflow can be downloaded from: https://github.com/malikyousef/GediNET.git or https://kni.me/w/3kH1SQV_mMUsMTS .
发现与特定疾病相关基因的最常见方法是基于机器学习,并使用各种特征选择技术来识别可作为给定疾病生物标志物的显著基因。最近,在这个过程中整合基于先验知识的方法在发现具有潜在转化应用的新生物标志物方面显示出了很大的前景。在这项研究中,我们开发了一种新方法 GediNET,该方法将先验生物学知识整合到与特定疾病(如癌症)相关的基因组中。GediNET 的新颖之处在于,它还可以发现特定疾病与其他疾病之间的显著关联。该过程的第一步涉及识别基因组。然后,对这些组进行评分组件分析,以确定表现最佳的分类组。排名最高的基因组随后用于训练机器学习模型。GediNET 通过分组、评分和建模 (G-S-M) 过程来识别与该特征类似的其他疾病。GediNET 通过基于疾病-疾病关联 (DDA) 的机器学习来识别这些关系。DDA 探索了疾病之间的新关联,并确定了可用于进一步改进诊断、预后和治疗方法的关系。GediNET 的 KNIME 工作流程可从以下网址下载:https://github.com/malikyousef/GediNET.git 或 https://kni.me/w/3kH1SQV_mMUsMTS。