Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute, MOE Key Laboratory of Major Diseases in Children, Genetics and Birth Defects Control Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China.
Bioinformatics. 2020 Jun 1;36(12):3788-3794. doi: 10.1093/bioinformatics/btaa239.
Gene expression profiling is widely used in basic and cancer research but still not feasible in many clinical applications because tissues, such as brain samples, are difficult and not ethnical to collect. Gene expression in uncollected tissues can be computationally inferred using genotype and expression quantitative trait loci. No methods can infer unmeasured gene expression of multiple tissues with single tissue gene expression profile as input.
Here, we present a Bayesian ridge regression-based method (B-GEX) to infer gene expression profiles of multiple tissues from blood gene expression profile. For each gene in a tissue, a low-dimensional feature vector was extracted from whole blood gene expression profile by feature selection. We used GTEx RNAseq data of 16 tissues to train inference models to capture the cross-tissue expression correlations between each target gene in a tissue and its preselected feature genes in peripheral blood. We compared B-GEX with least square regression, LASSO regression and ridge regression. B-GEX outperforms the other three models in most tissues in terms of mean absolute error, Pearson correlation coefficient and root-mean-squared error. Moreover, B-GEX infers expression level of tissue-specific genes as well as those of non-tissue-specific genes in all tissues. Unlike previous methods, which require genomic features or gene expression profiles of multiple tissues, our model only requires whole blood expression profile as input. B-GEX helps gain insights into gene expressions of uncollected tissues from more accessible data of blood.
B-GEX is available at https://github.com/xuwenjian85/B-GEX.
Supplementary data are available at Bioinformatics online.
基因表达谱分析广泛应用于基础和癌症研究,但在许多临床应用中仍然不可行,因为组织(如脑样本)难以采集且不具伦理。未采集组织中的基因表达可以通过基因型和表达数量性状基因座进行计算推断。没有方法可以使用单个组织的基因表达谱作为输入来推断多个组织的未测量基因表达。
在这里,我们提出了一种基于贝叶斯岭回归的方法(B-GEX),用于从血液基因表达谱推断多个组织的基因表达谱。对于组织中的每个基因,通过特征选择从全血基因表达谱中提取一个低维特征向量。我们使用 GTEx RNAseq 数据中的 16 个组织来训练推断模型,以捕获组织中每个目标基因与其外周血中预选特征基因之间的跨组织表达相关性。我们将 B-GEX 与最小二乘回归、LASSO 回归和岭回归进行了比较。在大多数组织中,B-GEX 在平均绝对误差、皮尔逊相关系数和均方根误差方面均优于其他三种模型。此外,B-GEX 可以推断所有组织中组织特异性基因和非组织特异性基因的表达水平。与以前需要基因组特征或多个组织的基因表达谱的方法不同,我们的模型仅需要全血表达谱作为输入。B-GEX 有助于从更易获得的血液数据中深入了解未采集组织中的基因表达。
B-GEX 可在 https://github.com/xuwenjian85/B-GEX 上获得。
补充数据可在生物信息学在线获得。