Sinha Koushiki, Chakraborty Sanchari, Bardhan Arohit, Saha Riju, Chakraborty Srijan, Biswas Surama
Department of CSE, Meghnad Saha Institute of Technology, Behind Urbana Complex Near Ruby General Hospital, Anandapur Rd, Uchhepota, Kolkata, West Bengal, 700150, India.
Biochem Genet. 2024 Dec 6. doi: 10.1007/s10528-024-10987-z.
Identifying the set of genes collectively responsible for causing a disease from differential gene expression data is called gene selection problem. Though many complex methodologies have been applied to solve gene selection, formulated as an optimization problem, this study introduces a new simple, efficient, and biologically plausible solution procedure where the collective power of the targeted gene set to discriminate between diseased and normal gene expression profiles was focused. It uses Simulated Annealing to solve the underlying optimization problem and termed here as Differential Gene Expression Based Simulated Annealing (DGESA). The Ranked Variance (RV) method has been applied to prioritize genes to form reference set to compare with the outcome of DGESA. In a case study on Eosinophilic Esophagitis (EoE) and other gastrointestinal diseases, RV identified the top 40 high-variance genes, overlapping with disease-causing genes from DGESA. DGESA identified 40 gene pathways each for EoE, Crohn's Disease (CD), and Ulcerative Colitis (UC), with 10 genes for EoE, 8 for CD, and 7 for UC confirmed in literature. For EoE, confirmed genes include KRT79, CRISP2, IL36G, SPRR2B, SPRR2D, and SPRR2E. For CD, validated genes are NPDC1, SLC2A4RG, LGALS8, CDKN1A, XAF1, and CYBA. For UC, confirmed genes include TRAF3, BAG6, CCDC80, CDC42SE2, and HSPA9. RV and DGESA effectively elucidate molecular signatures in gastrointestinal diseases. Validating genes like SPRR2B, SPRR2D, SPRR2E, and STAT6 for EoE demonstrates DGESA's efficacy, highlighting potential targets for future research.
从差异基因表达数据中识别出共同导致疾病的基因集的问题被称为基因选择问题。尽管已经应用了许多复杂的方法来解决基因选择问题(该问题被表述为一个优化问题),但本研究引入了一种新的简单、高效且生物学上合理的解决方案,该方案聚焦于目标基因集区分患病和正常基因表达谱的集体能力。它使用模拟退火算法来解决潜在的优化问题,并在此处称为基于差异基因表达的模拟退火算法(DGESA)。排序方差(RV)方法已被用于对基因进行优先级排序,以形成参考集,以便与DGESA的结果进行比较。在一项针对嗜酸性食管炎(EoE)和其他胃肠道疾病的案例研究中,RV识别出了前40个高方差基因,这些基因与DGESA中导致疾病的基因重叠。DGESA分别为EoE、克罗恩病(CD)和溃疡性结肠炎(UC)识别出了40个基因通路,其中EoE有10个基因、CD有8个基因、UC有7个基因在文献中得到了证实。对于EoE,已证实的基因包括KRT79、CRISP2、IL36G、SPRR2B、SPRR2D和SPRR2E。对于CD,已验证的基因是NPDC1、SLC2A4RG、LGALS8、CDKNIA、XAF1和CYBA。对于UC,已证实的基因包括TRAF3、BAG6、CCDC80、CDC42SE2和HSPA9。RV和DGESA有效地阐明了胃肠道疾病中的分子特征。对EoE的SPRR2B、SPRR2D、SPRR2E和STAT6等基因的验证证明了DGESA的有效性,突出了未来研究的潜在靶点。