Gholizadeh Maryam, Mazlooman Seyed Reza, Hadizadeh Morteza, Drozdzik Marek, Eslami Saeid
Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad 91388-13944, Iran.
Department of Computer Engineering, Central Tehran Branch, Islamic Azad University, Tehran 1477893780, Iran.
MethodsX. 2023 Jan 18;10:102021. doi: 10.1016/j.mex.2023.102021. eCollection 2023.
One methodology extensively used to develop biomarkers is the precise detection of highly responsive genes that can distinguish cancer samples from healthy samples. The purpose of this study was to screen for potential hepatocellular carcinoma (HCC) biomarkers based on non-fusion integrative multi-platform meta-analysis method. The gene expression profiles of liver tissue samples from two microarray platforms were initially analyzed using a meta-analysis based on an empirical Bayesian method to robust discover differentially expressed genes in HCC and non-tumor tissues. Then, using the bioinformatics technique of weighted correlation network analysis, the highly associated prioritized Differentially Expressed Genes (DEGs) were clustered. Co-expression network and topological analysis were utilized to identify sub-clusters and confirm candidate genes. Next, a diagnostic model was developed and validated using a machine learning algorithm. To construct a prognostic model, the Cox proportional hazard regression analysis was applied and validated. We identified three genes as specific biomarkers for the diagnosis of HCC based on accuracy and feasibility. The diagnostic model's area under the curve was 0.931 with confidence interval of 0.923-0.952.•Non-fusion integrative multi-platform meta-analysis method.•Classification methods and biomarkers recognition via machine learning method.•Biomarker validation models.
一种广泛用于开发生物标志物的方法是精确检测能够区分癌症样本与健康样本的高反应性基因。本研究的目的是基于非融合整合多平台荟萃分析方法筛选潜在的肝细胞癌(HCC)生物标志物。最初使用基于经验贝叶斯方法的荟萃分析对来自两个微阵列平台的肝组织样本的基因表达谱进行分析,以稳健地发现HCC组织和非肿瘤组织中差异表达的基因。然后,使用加权相关网络分析的生物信息学技术,对高度相关的优先差异表达基因(DEG)进行聚类。利用共表达网络和拓扑分析来识别子聚类并确认候选基因。接下来,使用机器学习算法开发并验证诊断模型。为构建预后模型,应用并验证Cox比例风险回归分析。基于准确性和可行性,我们鉴定出三个基因作为诊断HCC的特异性生物标志物。诊断模型的曲线下面积为0.931,置信区间为0.923 - 0.952。
•非融合整合多平台荟萃分析方法。
•通过机器学习方法进行分类及生物标志物识别。
•生物标志物验证模型。