Price Adam, Garhyan Jaishree, Gibas Cynthia
Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States of America.
Jawaharlal Nehru University, New Delhi, India.
PLoS One. 2017 Feb 28;12(2):e0173023. doi: 10.1371/journal.pone.0173023. eCollection 2017.
High-throughput sequencing is subject to sequence dependent bias, which must be accounted for if researchers are to make precise measurements and draw accurate conclusions from their data. A widely studied source of bias in sequencing is the GC content bias, in which levels of GC content in a genomic region effect the number of reads produced during sequencing. Although some research has been performed on methods to correct for GC bias, there has been little effort to understand the underlying mechanism. The availability of sequencing protocols that target the specific location of structure in nucleic acid molecules enables us to investigate the underlying molecular origin of observed GC bias in sequencing. By applying a parallel analysis of RNA structure (PARS) protocol to bacterial genomes of varying GC content, we are able to observe the relationship between local RNA secondary structure and sequencing outcome, and to establish RNA secondary structure as the significant contributing factor to observed GC bias.
高通量测序存在序列依赖性偏差,如果研究人员要进行精确测量并从数据中得出准确结论,就必须考虑到这一点。测序中一个被广泛研究的偏差来源是GC含量偏差,即基因组区域中的GC含量水平会影响测序过程中产生的读数数量。尽管已经对校正GC偏差的方法进行了一些研究,但对于其潜在机制的了解却很少。针对核酸分子中特定结构位置的测序方案的出现,使我们能够研究测序中观察到的GC偏差的潜在分子起源。通过将RNA结构平行分析(PARS)方案应用于不同GC含量的细菌基因组,我们能够观察到局部RNA二级结构与测序结果之间的关系,并确定RNA二级结构是观察到的GC偏差的重要影响因素。