School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia.
Melbourne Integrative Genomics, University of Melbourne, Parkville, Victoria, Australia.
Genome Biol Evol. 2021 Nov 5;13(11). doi: 10.1093/gbe/evab247.
Chimpanzees (Pan troglodytes) are a genetically diverse species, consisting of four highly distinct subspecies. As humans' closest living relative, they have been a key model organism in the study of human evolution, and comparisons of human and chimpanzee transcriptomes have been widely used to characterize differences in gene expression levels that could underlie the phenotypic differences between the two species. However, the subspecies from which these transcriptomic data sets have been derived is not recorded in metadata available in the public NCBI Sequence Read Archive (SRA). Furthermore, labeling of RNA sequencing (RNA-seq) samples is for the most part inconsistent across studies, and the true number of individuals from whom transcriptomic data are available is difficult to ascertain. Thus, we have evaluated genetic diversity at the subspecies and individual level in 486 public RNA-seq samples available in the SRA, spanning the vast majority of public chimpanzee transcriptomic data. Using multiple population genetics approaches, we find that nearly all samples (96.6%) have some degree of Western chimpanzee ancestry. At the individual donor level, we identify multiple samples that have been repeatedly analyzed across different studies and identify a total of 135 genetically distinct individuals within our data, a number that falls to 89 when we exclude likely first- and second-degree relatives. Altogether, our results show that current transcriptomic data from chimpanzees are capturing low levels of genetic diversity relative to what exists in wild chimpanzee populations. These findings provide important context to current comparative transcriptomics research involving chimpanzees.
黑猩猩(Pan troglodytes)是一个遗传多样性很高的物种,由四个高度不同的亚种组成。作为人类最亲近的现存亲属,黑猩猩一直是研究人类进化的重要模式生物,人类和黑猩猩转录组的比较已被广泛用于描述可能导致两个物种表型差异的基因表达水平差异。然而,这些转录组数据集所源自的亚种在公共 NCBI 序列读取档案 (SRA) 中可用的元数据中并未记录。此外,RNA 测序 (RNA-seq) 样本的标记在大多数情况下在不同的研究中不一致,并且很难确定可用转录组数据的确切个体数量。因此,我们评估了 SRA 中可用的 486 个公共 RNA-seq 样本中亚种和个体水平的遗传多样性,这些样本涵盖了绝大多数公共黑猩猩转录组数据。使用多种群体遗传学方法,我们发现几乎所有样本(96.6%)都具有一定程度的西部黑猩猩血统。在个体供体水平上,我们确定了多个在不同研究中被重复分析的样本,并在我们的数据中确定了总共 135 个具有遗传差异的个体,当我们排除可能的一级和二级亲属时,这个数字下降到 89。总之,我们的研究结果表明,目前从黑猩猩中获得的转录组数据相对于野生黑猩猩种群中存在的遗传多样性水平较低。这些发现为涉及黑猩猩的当前比较转录组学研究提供了重要的背景信息。