IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):491-500. doi: 10.1109/TCBB.2020.3003830. Epub 2022 Feb 3.
The majority of clinical trials fail due to low efficacy of investigated drugs, often resulting from a poor choice of target protein. Existing computational approaches aim to support target selection either via genetic evidence or by putting potential targets into the context of a disease specific network reconstruction. The purpose of this work was to investigate whether network representation learning techniques could be used to allow for a machine learning based prioritization of putative targets. We propose a novel target prioritization approach, GuiltyTargets, which relies on attributed network representation learning of a genome-wide protein-protein interaction network annotated with disease-specific differential gene expression and uses positive-unlabeled (PU) machine learning for candidate ranking. We evaluated our approach on 12 datasets from six diseases of different type (cancer, metabolic, neurodegenerative) within a 10 times repeated 5-fold stratified cross-validation and achieved AUROC values between 0.92 - 0.97, significantly outperforming previous approaches that relied on manually engineered topological features. Moreover, we showed that GuiltyTargets allows for target repositioning across related disease areas. An application of GuiltyTargets to Alzheimer's disease resulted in a number of highly ranked candidates that are currently discussed as targets in the literature. Interestingly, one (COMT) is also the target of an approved drug (Tolcapone) for Parkinson's disease, highlighting the potential for target repositioning with our method. The GuiltyTargets Python package is available on PyPI and all code used for analysis can be found under the MIT License at https://github.com/GuiltyTargets. Attributed network representation learning techniques provide an interesting approach to effectively leverage the existing knowledge about the molecular mechanisms in different diseases. In this work, the combination with positive-unlabeled learning for target prioritization demonstrated a clear superiority compared to classical feature engineering approaches. Our work highlights the potential of attributed network representation learning for target prioritization. Given the overarching relevance of networks in computational biology we believe that attributed network representation learning techniques could have a broader impact in the future.
大多数临床试验因所研究药物的疗效低而失败,这通常是由于目标蛋白选择不当所致。现有的计算方法旨在通过遗传证据或将潜在靶点置于特定疾病的网络重建背景下,来支持靶点选择。本研究旨在探讨网络表示学习技术是否可用于基于机器学习对假定靶点进行优先级排序。我们提出了一种新的靶点优先级排序方法 GuiltyTargets,该方法依赖于具有疾病特异性差异基因表达注释的全基因组蛋白质-蛋白质相互作用网络的属性网络表示学习,并使用正-未标记(PU)机器学习对候选物进行排名。我们在 6 种不同类型(癌症、代谢、神经退行性疾病)的 12 个数据集上进行了 10 次重复 5 倍分层交叉验证,AUROC 值在 0.92-0.97 之间,显著优于以前依赖于人工工程拓扑特征的方法。此外,我们还表明,GuiltyTargets 可以在相关疾病领域进行靶点重定位。GuiltyTargets 在阿尔茨海默病中的应用产生了许多排名较高的候选物,这些候选物目前在文献中被认为是靶点。有趣的是,其中一个(COMT)也是一种已批准用于帕金森病的药物(Tolcapone)的靶点,这突出了我们方法进行靶点重定位的潜力。GuiltyTargets Python 包可在 PyPI 上获得,所有用于分析的代码都可以在 MIT 许可证下在 https://github.com/GuiltyTargets 找到。属性网络表示学习技术为有效地利用不同疾病中分子机制的现有知识提供了一种有趣的方法。在这项工作中,与正-未标记学习相结合的目标优先级排序方法与经典特征工程方法相比表现出明显的优势。我们的工作突出了属性网络表示学习在目标优先级排序中的潜力。鉴于网络在计算生物学中的总体相关性,我们相信属性网络表示学习技术在未来可能会产生更广泛的影响。