Carels Nicolas
Laboratory of Biological System Modeling, Center of Technological Development in Health (CDTS), Oswaldo Cruz Foundation (Fiocruz), Rio de Janeiro 21040-900, RJ, Brazil.
Biology (Basel). 2024 Jun 28;13(7):482. doi: 10.3390/biology13070482.
RNA-seq faces persistent challenges due to the ongoing, expanding array of data processing workflows, none of which have yet achieved standardization to date. It is imperative to determine which method most effectively preserves biological facts. Here, we used Shannon entropy as a tool for depicting the biological status of a system. Thus, we assessed the measurement of Shannon entropy by several RNA-seq workflow approaches, such as DESeq2 and edgeR, but also by combining nine normalization methods with log fold change on paired samples of TCGA RNA-seq representing datasets of 515 patients and spanning 12 different cancer types with 5-year overall survival rates ranging from 20% to 98%. Our analysis revealed that TPM, RLE, and TMM normalization, coupled with a threshold of log fold change ≥1, for identifying differentially expressed genes, yielded the best results. We propose that Shannon entropy can serve as an objective metric for refining the optimization of RNA-seq workflows and mRNA sequencing technologies.
由于不断涌现且日益扩展的数据处理工作流程阵列,RNA测序面临着持续的挑战,到目前为止,这些流程中没有一个实现了标准化。确定哪种方法能最有效地保留生物学事实势在必行。在这里,我们使用香农熵作为描绘系统生物学状态的工具。因此,我们通过几种RNA测序工作流程方法(如DESeq2和edgeR),还通过将九种标准化方法与代表515名患者数据集且涵盖12种不同癌症类型、5年总生存率在20%至98%之间的TCGA RNA测序配对样本上的对数倍数变化相结合,来评估香农熵的测量。我们的分析表明,对于识别差异表达基因,TPM、RLE和TMM标准化与对数倍数变化阈值≥1相结合,产生了最佳结果。我们建议香农熵可以作为一种客观指标,用于优化RNA测序工作流程和mRNA测序技术。