Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA, 98195, USA.
Molecular and Cellular Biology PhD Program, University of Washington, Seattle, WA, 98195, USA.
Sci Rep. 2021 Jan 11;11(1):449. doi: 10.1038/s41598-020-80049-y.
The ribosomal RNA genes (rDNA) are tandemly arrayed in most eukaryotes and exhibit vast copy number variation. There is growing interest in integrating this variation into genotype-phenotype associations. Here, we explored a possible association of rDNA copy number variation with autism spectrum disorder and found no difference between probands and unaffected siblings. Because short-read sequencing estimates of rDNA copy number are error prone, we sought to validate our 45S estimates. Previous studies reported tightly correlated, concerted copy number variation between the 45S and 5S arrays, which should enable the validation of 45S copy number estimates with pulsed-field gel-verified 5S copy numbers. Here, we show that the previously reported strong concerted copy number variation may be an artifact of variable data quality in the earlier published 1000 Genomes Project sequences. We failed to detect a meaningful correlation between 45S and 5S copy numbers in thousands of samples from the high-coverage Simons Simplex Collection dataset as well as in the recent high-coverage 1000 Genomes Project sequences. Our findings illustrate the challenge of genotyping repetitive DNA regions accurately and call into question the accuracy of recently published studies of rDNA copy number variation in cancer that relied on diverse publicly available resources for sequence data.
核糖体 RNA 基因(rDNA)在大多数真核生物中串联排列,并表现出巨大的拷贝数变异。人们越来越感兴趣的是将这种变异整合到基因型-表型关联中。在这里,我们探索了 rDNA 拷贝数变异与自闭症谱系障碍之间的可能关联,并未发现先证者和无影响同胞之间存在差异。由于短读测序估计的 rDNA 拷贝数容易出错,我们试图验证我们的 45S 估计值。先前的研究报告称,45S 和 5S 阵列之间存在紧密相关的协同拷贝数变异,这应该能够通过经脉冲场凝胶验证的 5S 拷贝数来验证 45S 拷贝数估计值。在这里,我们表明,先前报道的强烈协同拷贝数变异可能是早期发表的 1000 基因组计划序列中可变数据质量的人为产物。我们未能在来自高覆盖率西蒙斯单体集合数据集的数千个样本以及最近的高覆盖率 1000 基因组计划序列中检测到 45S 和 5S 拷贝数之间有意义的相关性。我们的研究结果说明了准确基因分型重复 DNA 区域的挑战,并对最近发表的依赖于各种公开可用资源进行序列数据的癌症 rDNA 拷贝数变异研究的准确性提出了质疑。