Suppr超能文献

准确测序能够形成替代(非 B)结构的 DNA 基序。

Accurate sequencing of DNA motifs able to form alternative (non-B) structures.

机构信息

Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.

Department of Operations and Decision Systems, Université Laval, Quebec, Quebec G1V0A6, Canada.

出版信息

Genome Res. 2023 Jun;33(6):907-922. doi: 10.1101/gr.277490.122. Epub 2023 Jul 11.

Abstract

Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.

摘要

大约 13%的人类基因组在特定的基序中具有形成非规范(非 B)DNA 结构的潜力(例如 G-四联体、十字结构和 Z-DNA),这些结构调节许多细胞过程,但也影响聚合酶和解旋酶的活性。由于测序技术使用这些酶,它们在非 B 结构中可能会有更高的错误率。为了评估这一点,我们分析了 Illumina、Pacific Biosciences(PacBio)HiFi 和 Oxford Nanopore Technologies(ONT)在非 B 基序上的测序错误率、读深和碱基质量。尽管这可能是由于结构形成、GC 含量偏倚和存在同源多聚体等多种因素造成的,但所有技术在大多数非 B 基序类型上的测序成功率都发生了改变。在所有非 B 基序类型中,HiFi 和 ONT 的单核苷酸错配错误的偏倚较低,但在所有三种技术中,G-四联体和 Z-DNA 的错误率都增加了。在所有非 B 类型中,缺失错误都增加了,但在 Illumina 和 HiFi 中,除了 Z-DNA 外,在 ONT 中也是如此。在 Illumina、HiFi 和 ONT 中,非 B 基序的插入错误分别高度、中度和轻度增加。此外,我们开发了一种概率方法来根据样本大小和变异频率确定非 B 基序中假阳性的数量,并将其应用于公开可用的数据集(1000 基因组、西蒙斯基因组多样性计划和 gnomAD)。我们得出的结论是,在低读深研究(单细胞、古 DNA 和混合样本群体测序)和稀有变异评分中,应该考虑非 B DNA 基序中测序错误的增加。在未来非 B DNA 的研究中,结合多种技术应该可以最大限度地提高测序准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a9f/10519405/00c887bc355f/907f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验