Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, of the Instituto de Investigación Sanitaria de Santiago (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15706 Galicia, Spain.
Translational Pediatrics and Infectious Diseases Unit, and GENVIP Research Group (www.genvip.org) of the Instituto de Investigación Sanitaria de Santiago (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), 15706 Galicia, Spain.
RNA. 2019 Jul;25(7):857-868. doi: 10.1261/rna.070052.118. Epub 2019 Apr 22.
There is a growing body of evidence suggesting that patterns of gene expression vary within and between human populations. However, the impact of this variation in human diseases has been poorly explored, in part owing to the lack of a standardized protocol to estimate biogeographical ancestry from gene expression studies. Here we examine several studies that provide new solid evidence indicating that the ancestral background of individuals impacts gene expression patterns. Next, we test a procedure to infer genetic ancestry from RNA-seq data in 25 data sets where information on ethnicity was reported. Genome data of reference continental populations retrieved from The 1000 Genomes Project were used for comparisons. Remarkably, only eight out of 25 data sets passed FastQC default filters. We demonstrate that, for these eight population sets, the ancestral background of donors could be inferred very efficiently, even in data sets including samples with complex patterns of admixture (e.g., American-admixed populations). For most of the gene expression data sets of suboptimal quality, ancestral inference yielded odd patterns. The present study thus brings a cautionary note for gene expression studies highlighting the importance to control for the potential confounding effect of ancestral genetic background.
越来越多的证据表明,人类群体内部和之间的基因表达模式存在差异。然而,这种人类疾病中的变异的影响还没有得到很好的探索,部分原因是缺乏一种标准化的协议来从基因表达研究中估计生物地理祖先。在这里,我们研究了几项提供新的可靠证据的研究,这些研究表明个体的祖先背景会影响基因表达模式。接下来,我们测试了一种从报告了种族信息的 25 个数据集的 RNA-seq 数据中推断遗传祖先的程序。我们使用从 1000 基因组计划中检索到的参考大陆人群的基因组数据进行了比较。值得注意的是,只有 25 个数据集的 8 个通过了 FastQC 的默认过滤器。我们证明,对于这八个人群数据集,即使在包括混合模式复杂的样本(例如,美国混合人群)的数据集,供体的祖先背景也可以非常有效地推断出来。对于大多数质量欠佳的基因表达数据集,祖先推断产生了奇怪的模式。因此,本研究为基因表达研究敲响了警钟,强调了控制祖先遗传背景的潜在混杂效应的重要性。