Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Dresden, Germany.
Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Dresden, Germany.
BMC Bioinformatics. 2023 Jul 29;24(1):304. doi: 10.1186/s12859-023-05418-6.
Integrating multi-omics data is fast becoming a powerful approach for predicting disease progression and treatment outcomes. In light of that, we introduce a modified version of the NetRank algorithm, a network-based algorithm for biomarker discovery that incorporates the protein associations, co-expressions, and functions with its phenotypic association to differentiate different types of cancer. NetRank is introduced here as a robust feature selection method for biomarker selection in cancer prediction. We assess the robustness and suitability of the RNA gene expression data through scanning genomic data for 19 cancer types with more than 3000 patients from The Cancer Genome Atlas (TCGA).
The results of evaluating different cancer type profiles from the TCGA data demonstrate the strength of our approach to identifying interpretable biomarker signatures for cancer outcome prediction. NetRank's biomarkers segregate most cancer types with an area under the curve (AUC) above 90% using compact signatures.
In this paper we provide a fast and efficient implementation of NetRank, with a case study from The Cancer Genome Atlas, to assess the performance. We incorporated complete functionality for pre and post-processing for RNA-seq gene expression data with functions for building protein-protein interaction networks. The source code of NetRank is freely available (at github.com/Alfatlawi/Omics-NetRank) with an installable R library. We also deliver a comprehensive practical user manual with examples and data attached to this paper.
整合多组学数据正迅速成为预测疾病进展和治疗结果的有力方法。有鉴于此,我们引入了 NetRank 算法的一个修改版本,该算法是一种基于网络的生物标志物发现算法,它将蛋白质的相互作用、共表达和功能与其表型关联相结合,以区分不同类型的癌症。在这里,NetRank 被引入作为癌症预测中生物标志物选择的一种强大的特征选择方法。我们通过扫描来自癌症基因组图谱 (TCGA) 的 19 种癌症类型的基因组数据,评估了 RNA 基因表达数据的稳健性和适用性,这些数据超过 3000 例。
从 TCGA 数据评估不同癌症类型谱的结果表明,我们的方法具有识别可解释的生物标志物特征以预测癌症结果的优势。NetRank 的生物标志物使用紧凑的特征对大多数癌症类型进行了分类,曲线下面积 (AUC) 超过 90%。
在本文中,我们提供了 NetRank 的快速有效的实现,并通过癌症基因组图谱的案例研究进行了评估。我们为 RNA-seq 基因表达数据的预处理和后处理以及构建蛋白质-蛋白质相互作用网络的功能提供了完整的功能。NetRank 的源代码可在 (github.com/Alfatlawi/Omics-NetRank) 上免费获得,并带有可安装的 R 库。我们还随本文提供了一个全面的实用用户手册,其中包含示例和附加数据。