Sanchez-Taltavull Daniel, Perkins Theodore J, Dommann Noelle, Melin Nicolas, Keogh Adrian, Candinas Daniel, Stroka Deborah, Beldi Guido
Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Department for BioMedical Research, University of Bern, Murtenstrasse 35, 3008 Bern, Switzerland.
Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, Ontario, ON K1H8L6, Canada.
NAR Genom Bioinform. 2020 Jan 24;2(1):lqaa002. doi: 10.1093/nargab/lqaa002. eCollection 2020 Mar.
Assessing similarity is highly important for bioinformatics algorithms to determine correlations between biological information. A common problem is that similarity can appear by chance, particularly for low expressed entities. This is especially relevant in single-cell RNA-seq (scRNA-seq) data because read counts are much lower compared to bulk RNA-seq. Recently, a Bayesian correlation scheme that assigns low similarity to genes that have low confidence expression estimates has been proposed to assess similarity for bulk RNA-seq. Our goal is to extend the properties of the Bayesian correlation in scRNA-seq data by considering three ways to compute similarity. First, we compute the similarity of pairs of genes over all cells. Second, we identify specific cell populations and compute the correlation in those populations. Third, we compute the similarity of pairs of genes over all clusters, by considering the total mRNA expression. We demonstrate that Bayesian correlations are more reproducible than Pearson correlations. Compared to Pearson correlations, Bayesian correlations have a smaller dependence on the number of input cells. We show that the Bayesian correlation algorithm assigns high similarity values to genes with a biological relevance in a specific population. We conclude that Bayesian correlation is a robust similarity measure in scRNA-seq data.
评估相似性对于生物信息学算法确定生物信息之间的相关性至关重要。一个常见的问题是相似性可能是偶然出现的,特别是对于低表达的实体。这在单细胞RNA测序(scRNA-seq)数据中尤为相关,因为与批量RNA测序相比,读取计数要低得多。最近,有人提出了一种贝叶斯相关方案,该方案对具有低置信度表达估计的基因赋予低相似性,以评估批量RNA测序的相似性。我们的目标是通过考虑三种计算相似性的方法,扩展贝叶斯相关在scRNA-seq数据中的特性。首先,我们计算所有细胞中基因对的相似性。其次,我们识别特定的细胞群体并计算这些群体中的相关性。第三,我们通过考虑总mRNA表达,计算所有簇中基因对的相似性。我们证明贝叶斯相关性比皮尔逊相关性更具可重复性。与皮尔逊相关性相比,贝叶斯相关性对输入细胞数量的依赖性更小。我们表明,贝叶斯相关算法为特定群体中具有生物学相关性的基因赋予高相似性值。我们得出结论,贝叶斯相关性是scRNA-seq数据中一种稳健的相似性度量。