Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh; Department of Statistics, Faculty of Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.
Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh.
Comput Biol Med. 2024 Aug;178:108769. doi: 10.1016/j.compbiomed.2024.108769. Epub 2024 Jun 18.
Differential expression (DE) analysis between cell types for scRNA-seq data by capturing its complicated features is crucial. Recently, different methods have been developed for targeting the scRNA-seq data analysis based on different modeling frameworks, assumptions, strategies and test statistic in considering various data features. The scDEA is an ensemble learning-based DE analysis method developed recently, yielding p-values using Lancaster's combination, generated by 12 individual DE analysis methods, and producing more accurate and stable results than individual methods. The objective of our study is to propose a new ensemble learning-based DE analysis method, scHD4E, using top performers in only 4 separate methods. The top performer 4 methods have been selected through an evaluation process using six real scRNA-seq data sets. We conducted comprehensive experiments for five experimental data sets to evaluate our proposed method based on the sample size effects, batch effects, type I error control, gene ontology enrichment analysis, runtime, identified matched DE genes, and semantic similarity measurement between methods. We also perform similar analyses (except the last 3 terms) and compute performance measures like accuracy, F1 score, Mathew's correlation coefficient etc. for a simulated data set. The results show that scHD4E is performs better than all the individual and scDEA methods in all the above perspectives. We expect that scHD4E will serve the modern data scientists for detecting the DEGs in scRNA-seq data analysis. To implement our proposed method, a Github R package scHD4E and its shiny application has been developed, and available in the following links: https://github.com/bbiswas1989/scHD4E and https://github.com/bbiswas1989/scHD4E-Shiny.
针对单细胞 RNA 测序 (scRNA-seq) 数据的细胞类型差异表达 (DE) 分析,捕获其复杂特征至关重要。最近,已经开发了不同的方法来针对基于不同建模框架、假设、策略和检验统计量的 scRNA-seq 数据分析,考虑了各种数据特征。scDEA 是最近开发的一种基于集成学习的 DE 分析方法,使用 Lancaster 组合生成 p 值,该组合由 12 种单独的 DE 分析方法生成,比单独的方法产生更准确和稳定的结果。我们的研究目的是提出一种新的基于集成学习的 DE 分析方法 scHD4E,仅使用 4 种单独方法中的佼佼者。通过使用 6 个真实的 scRNA-seq 数据集进行的评估过程选择了前 4 名的方法。我们对五个实验数据集进行了全面的实验,根据样本大小效应、批次效应、I 型错误控制、基因本体富集分析、运行时间、识别匹配的 DE 基因以及方法之间的语义相似性测量,评估了我们提出的方法。我们还对模拟数据集执行了类似的分析(最后 3 项除外),并计算了准确性、F1 得分、马修相关系数等性能指标。结果表明,scHD4E 在所有上述方面都优于所有单独和 scDEA 方法。我们期望 scHD4E 将为现代数据科学家提供服务,用于检测 scRNA-seq 数据分析中的 DEGs。为了实现我们提出的方法,我们开发了一个 Github R 包 scHD4E 及其 shiny 应用程序,并可在以下链接中获得:https://github.com/bbiswas1989/scHD4E 和 https://github.com/bbiswas1989/scHD4E-Shiny。