Dursun Cagatay, Smith Jennifer R, Hayman G Thomas, Kwitek Anne E, Bozdag Serdar
Dept. of Biomedical Engineering, Marquette University - Medical, College of Wisconsin, Milwaukee WI USA.
Rat Genome Database, Dept. of Biomedical Engineering, Department of Physiology, Medical College of Wisconsin, Milwaukee WI USA.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2020 Dec;2020:146-149. doi: 10.1109/bibm49941.2020.9313595. Epub 2021 Jan 13.
Complex diseases such as hypertension, cancer, and diabetes cause nearly 70% of the deaths in the U.S. and involve multiple genes and their interactions with environmental factors. Therefore, identification of genetic factors to understand and decrease the morbidity and mortality from complex diseases is an important and challenging task. With the generation of an unprecedented amount of multi-omics datasets, network-based methods have become popular to represent the multilayered complex molecular interactions. Particularly node embeddings, the low-dimensional representations of nodes in a network are utilized for gene function prediction. Integrated network analysis of multi-omics data alleviates the issues related to missing data and lack of context-specific datasets. Most of the node embedding methods, however, are unable to integrate multiple types of datasets from genes and phenotypes. To address this limitation, we developed a node embedding algorithm called Node Embeddings of Complex networks (NECo) that can utilize multilayered heterogeneous networks of genes and phenotypes. We evaluated the performance of NECo using genotypic and phenotypic datasets from rat () disease models to classify hypertension disease-related genes. Our method significantly outperformed the state-of-the-art node embedding methods, with AUC of 94.97% compared 85.98% in the second-best performer, and predicted genes not previously implicated in hypertension.
诸如高血压、癌症和糖尿病等复杂疾病导致了美国近70%的死亡,并且涉及多个基因及其与环境因素的相互作用。因此,识别遗传因素以了解并降低复杂疾病的发病率和死亡率是一项重要且具有挑战性的任务。随着前所未有的大量多组学数据集的产生,基于网络的方法已变得流行,用于表示多层复杂分子相互作用。特别是节点嵌入,网络中节点的低维表示被用于基因功能预测。多组学数据的综合网络分析缓解了与数据缺失和缺乏特定背景数据集相关的问题。然而,大多数节点嵌入方法无法整合来自基因和表型的多种类型的数据集。为了解决这一局限性,我们开发了一种名为复杂网络节点嵌入(NECo)的节点嵌入算法,它可以利用基因和表型的多层异质网络。我们使用来自大鼠疾病模型的基因型和表型数据集评估了NECo对高血压疾病相关基因进行分类的性能。我们的方法显著优于当前最先进的节点嵌入方法,曲线下面积(AUC)为94.97%,而表现第二好的方法为85.98%,并且预测出了先前未涉及高血压的基因。