Center for Human Genetics Research, Vanderbilt University, Nashville, TN, 37232, USA.
Trends Genet. 2013 Oct;29(10):593-9. doi: 10.1016/j.tig.2013.07.006. Epub 2013 Aug 22.
Exome sequencing is one of the most cost-efficient sequencing approaches for conducting genome research on coding regions. However, significant portions of the reads obtained in exome sequencing come from outside of the designed target regions. These additional reads are generally ignored, potentially wasting an important source of genomic data. There are three major types of unintentionally sequenced read that can be found in exome sequencing data: reads in introns and intergenic regions, reads in the mitochondrial genome, and reads originating in viral genomes. All of these can be used for reliable data mining, extending the utility of exome sequencing. Large-scale exome sequencing data repositories, such as The Cancer Genome Atlas (TCGA), the 1000 Genomes Project, National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project, and The Sequence Reads Archive, provide researchers with excellent secondary data-mining opportunities to study genomic data beyond the intended target regions.
外显子组测序是最具成本效益的测序方法之一,可用于对编码区域进行基因组研究。然而,在外显子组测序中获得的大量读取来自设计目标区域之外。这些额外的读取通常被忽略,可能浪费了重要的基因组数据来源。在外显子组测序数据中可以发现三种主要类型的非故意测序读取:内含子和基因间区域中的读取、线粒体基因组中的读取以及源自病毒基因组的读取。所有这些都可以用于可靠的数据挖掘,从而扩展外显子组测序的用途。大规模外显子组测序数据存储库,如癌症基因组图谱 (TCGA)、1000 基因组计划、美国国立心肺血液研究所 (NHLBI) 外显子组测序计划和序列读取档案,为研究人员提供了极好的二次数据挖掘机会,可用于研究目标区域之外的基因组数据。