Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, 10461, NY, USA.
Department of Mathematics, Columbia University, 2990 Broadway, New York, 10027, NY, USA.
BMC Bioinformatics. 2019 Dec 20;20(Suppl 24):668. doi: 10.1186/s12859-019-3252-0.
Skewness is an under-utilized statistical measure that captures the degree of asymmetry in the distribution of any dataset. This study applied a new metric based on skewness to identify regulators or genes that have outlier expression in large patient cohorts.
We investigated whether specific patterns of skewed expression were related to the enrichment of biological pathways or genomic properties like DNA methylation status. Our study used publicly available datasets that were generated using both RNA-sequencing and microarray technology platforms. For comparison, the datasets selected for this study also included different samples derived from control donors and cancer patients. When comparing the shift in expression skewness between cancer and control datasets, we observed an enrichment of pathways related to the immune function that reflects an increase towards positive skewness in the cancer relative to control datasets. A significant correlation was also detected between expression skewness and the top 500 genes corresponding to the most significant differential DNA methylation occurring in the promotor regions for four Cancer Genome Atlas cancer cohorts.
Our results indicate that expression skewness can reveal new insights into transcription based on outlier and asymmetrical behaviour in large patient cohorts.
偏度是一种未被充分利用的统计度量,用于捕捉任何数据集分布的不对称程度。本研究应用了一种基于偏度的新指标,以识别在大型患者队列中具有异常表达的调节剂或基因。
我们研究了特定的偏度表达模式是否与生物途径或基因组特征(如 DNA 甲基化状态)的富集有关。我们的研究使用了公开可用的数据集,这些数据集是使用 RNA 测序和微阵列技术平台生成的。为了进行比较,本研究选择的数据集还包括来自对照供体和癌症患者的不同样本。当比较癌症和对照数据集之间的表达偏度变化时,我们观察到与免疫功能相关的途径富集,这反映了癌症相对于对照数据集的正偏度增加。还检测到表达偏度与对应于四个癌症基因组图谱癌症队列中启动子区域中发生的最显著差异 DNA 甲基化的前 500 个基因之间存在显著相关性。
我们的结果表明,表达偏度可以基于大型患者队列中的异常值和不对称行为,为转录提供新的见解。