Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Cell Rep Methods. 2022 Nov 11;2(11):100332. doi: 10.1016/j.crmeth.2022.100332. eCollection 2022 Nov 21.
Markers are increasingly being used for several high-throughput data analysis and experimental design tasks. Examples include the use of markers for assigning cell types in scRNA-seq studies, for deconvolving bulk gene expression data, and for selecting marker proteins in single-cell spatial proteomics studies. Most marker selection methods focus on differential expression (DE) analysis. Although such methods work well for data with a few non-overlapping marker sets, they are not appropriate for large atlas-size datasets where several cell types and tissues are considered. To address this, we define the phenotype cover (PC) problem for marker selection and present algorithms that can improve the discriminative power of marker sets. Analysis of these sets on several marker-selection tasks suggests that these methods can lead to solutions that accurately distinguish different phenotypes in the data.
标记物越来越多地被用于多种高通量数据分析和实验设计任务。例如,在 scRNA-seq 研究中使用标记物来分配细胞类型,对批量基因表达数据进行去卷积,以及在单细胞空间蛋白质组学研究中选择标记蛋白。大多数标记物选择方法都集中在差异表达(DE)分析上。虽然这些方法对于具有少数不重叠的标记集的数据效果很好,但对于考虑了几种细胞类型和组织的大型图谱数据集来说并不适用。为了解决这个问题,我们定义了用于标记物选择的表型覆盖率(PC)问题,并提出了可以提高标记物集区分能力的算法。在几个标记物选择任务上对这些集合的分析表明,这些方法可以得到准确区分数据中不同表型的解决方案。