Department of Biomedical Informatics, Vanderbilt University, 2220 Pierce Ave, 571 PRB, Nashville, TN, 37027, USA.
Vanderbilt Genetics Institute, Department of Molecular Physiology and Biophysics, Vanderbilt University Medical School, Nashville, TN, USA.
BMC Genomics. 2017 Oct 3;18(Suppl 6):690. doi: 10.1186/s12864-017-4022-x.
High throughput sequencing technology enables the both the human genome and transcriptome to be screened at the single nucleotide resolution. Tools have been developed to infer single nucleotide variants (SNVs) from both DNA and RNA sequencing data. To evaluate how much difference can be expected between DNA and RNA sequencing data, and among tissue sources, we designed a study to examine the single nucleotide difference among five sources of high throughput sequencing data generated from the same individual, including exome sequencing from blood, tumor and adjacent normal tissue, and RNAseq from tumor and adjacent normal tissue.
Through careful quality control and analysis of the SNVs, we found little difference between DNA-DNA pairs (1%-2%). However, between DNA-RNA pairs, SNV differences ranged anywhere from 10% to 20%.
Only a small portion of these differences can be explained by RNA editing. Instead, the majority of the DNA-RNA differences should be attributed to technical errors from sequencing and post-processing of RNAseq data. Our analysis results suggest that SNV detection using RNAseq is subject to high false positive rates.
高通量测序技术能够以单核苷酸分辨率筛选人类基因组和转录组。已经开发了一些工具来从 DNA 和 RNA 测序数据中推断单核苷酸变异(SNV)。为了评估 DNA 和 RNA 测序数据之间以及不同组织来源之间可能存在的差异,我们设计了一项研究,以检查来自同一个体的五种高通量测序数据来源(包括血液、肿瘤和相邻正常组织的外显子测序,以及肿瘤和相邻正常组织的 RNAseq)中单核苷酸差异。
通过对 SNV 的仔细质量控制和分析,我们发现 DNA-DNA 对之间的差异很小(1%-2%)。然而,在 DNA-RNA 对之间,SNV 差异范围在 10%到 20%之间。
这些差异只有一小部分可以用 RNA 编辑来解释。相反,大多数 DNA-RNA 差异应该归因于 RNAseq 数据测序和后处理的技术错误。我们的分析结果表明,使用 RNAseq 进行 SNV 检测存在较高的假阳性率。