Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA.
Bioinformatics. 2020 Feb 1;36(3):782-788. doi: 10.1093/bioinformatics/btz619.
Patterns of gene expression, quantified at the level of tissue or cells, can inform on etiology of disease. There are now rich resources for tissue-level (bulk) gene expression data, which have been collected from thousands of subjects, and resources involving single-cell RNA-sequencing (scRNA-seq) data are expanding rapidly. The latter yields cell type information, although the data can be noisy and typically are derived from a small number of subjects.
Complementing these approaches, we develop a method to estimate subject- and cell-type-specific (CTS) gene expression from tissue using an empirical Bayes method that borrows information across multiple measurements of the same tissue per subject (e.g. multiple regions of the brain). Analyzing expression data from multiple brain regions from the Genotype-Tissue Expression project (GTEx) reveals CTS expression, which then permits downstream analyses, such as identification of CTS expression Quantitative Trait Loci (eQTL).
We implement this method as an R package MIND, hosted on https://github.com/randel/MIND.
Supplementary data are available at Bioinformatics online.
在组织或细胞水平上定量的基因表达模式可以为疾病的病因提供信息。现在有丰富的组织水平(批量)基因表达数据资源,这些数据已经从数千个样本中收集而来,涉及单细胞 RNA 测序(scRNA-seq)数据的资源也在迅速扩展。后者提供了细胞类型信息,尽管数据可能存在噪声,并且通常来自少数样本。
为了补充这些方法,我们开发了一种使用经验贝叶斯方法从组织中估计个体和细胞类型特异性(CTS)基因表达的方法,该方法通过跨每个样本的同一组织的多个测量值(例如大脑的多个区域)来借用信息。分析来自基因型组织表达项目(GTEx)的多个大脑区域的表达数据揭示了 CTS 表达,然后可以进行下游分析,例如鉴定 CTS 表达数量性状基因座(eQTL)。
我们将此方法实现为一个 R 包 MIND,并托管在 https://github.com/randel/MIND 上。
补充数据可在生物信息学在线获得。