Saikia Manaswita, Bhattacharyya Dhruba K, Kalita Jugal K
Department of Computer Science and Engineering, Tezpur University, Napaam, Tezpur, Assam 784028 India.
Department of Computer Science, College of Engineering and Applied Science, University of Colorado, Colorado Springs, CO 80918 USA.
SN Comput Sci. 2023;4(2):114. doi: 10.1007/s42979-022-01492-4. Epub 2022 Dec 21.
This paper presents a consensus-based approach that incorporates three microarray and three RNA-Seq methods for unbiased and integrative identification of differentially expressed genes (DEGs) as potential biomarkers for critical disease(s). The proposed method performs satisfactorily on two microarray datasets (GSE20347 and GSE23400) and one RNA-Seq dataset (GSE130078) for esophageal squamous cell carcinoma (ESCC). Based on the input dataset, our framework employs specific DE methods to detect DEGs independently. A consensus based function that first considers DEGs common to all three methods for further downstream analysis has been introduced. The consensus function employs other parameters to overcome information loss. Differential co-expression (DCE) and preservation analysis of DEGs facilitates the study of behavioral changes in interactions among DEGs under normal and diseased circumstances. Considering hub genes in biologically relevant modules and most GO and pathway enriched DEGs as candidates for potential biomarkers of ESCC, we perform further validation through biological analysis as well as literature evidence. We have identified 25 DEGs that have strong biological relevance to their respective datasets and have previous literature establishing them as potential biomarkers for ESCC. We have further identified 8 additional DEGs as probable potential biomarkers for ESCC, but recommend further in-depth analysis.
本文提出了一种基于共识的方法,该方法结合了三种微阵列和三种RNA测序方法,用于无偏且综合地鉴定差异表达基因(DEG),将其作为严重疾病的潜在生物标志物。所提出的方法在两个食管鳞状细胞癌(ESCC)微阵列数据集(GSE20347和GSE23400)以及一个RNA测序数据集(GSE130078)上表现良好。基于输入数据集,我们的框架采用特定的差异表达方法独立检测差异表达基因。引入了一种基于共识的函数,该函数首先考虑所有三种方法共有的差异表达基因,用于进一步的下游分析。该共识函数采用其他参数来克服信息损失。差异共表达(DCE)和差异表达基因的保存分析有助于研究正常和患病情况下差异表达基因之间相互作用的行为变化。考虑到生物学相关模块中的枢纽基因以及大多数基因本体(GO)和通路富集的差异表达基因作为ESCC潜在生物标志物的候选者,我们通过生物学分析以及文献证据进行进一步验证。我们已经鉴定出25个差异表达基因,它们与其各自的数据集具有很强的生物学相关性,并且先前的文献已将它们确立为ESCC的潜在生物标志物。我们还进一步鉴定出另外8个差异表达基因作为ESCC可能的潜在生物标志物,但建议进行进一步深入分析。