Illumina 短读测序数据中链偏向的影响。

The effect of strand bias in Illumina short-read sequencing data.

机构信息

Vanderbilt Ingram Cancer Center, Nashville, TN, USA.

出版信息

BMC Genomics. 2012 Nov 24;13:666. doi: 10.1186/1471-2164-13-666.

DOI:10.1186/1471-2164-13-666

PMID:23176052

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3532123/

Abstract

BACKGROUND

When using Illumina high throughput short read data, sometimes the genotype inferred from the positive strand and negative strand are significantly different, with one homozygous and the other heterozygous. This phenomenon is known as strand bias. In this study, we used Illumina short-read sequencing data to evaluate the effect of strand bias on genotyping quality, and to explore the possible causes of strand bias.

RESULT

We collected 22 breast cancer samples from 22 patients and sequenced their exome using the Illumina GAIIx machine. By comparing the consistency between the genotypes inferred from this sequencing data with the genotypes inferred from SNP chip data, we found that, when using sequencing data, SNPs with extreme strand bias did not have significantly lower consistency rates compared to SNPs with low or no strand bias. However, this result may be limited by the small subset of SNPs present in both the exome sequencing and the SNP chip data. We further compared the transition and transversion ratio and the number of novel non-synonymous SNPs between the SNPs with low or no strand bias and those with extreme strand bias, and found that SNPs with low or no strand bias have better overall quality. We also discovered that the strand bias occurs randomly at genomic positions across these samples, and observed no consistent pattern of strand bias location across samples. By comparing results from two different aligners, BWA and Bowtie, we found very consistent strand bias patterns. Thus strand bias is unlikely to be caused by alignment artifacts. We successfully replicated our results using two additional independent datasets with different capturing methods and Illumina sequencers.

CONCLUSION

Extreme strand bias indicates a potential high false-positive rate for SNPs.

摘要

背景

当使用 Illumina 高通量短读数据时，有时从正链和负链推断出的基因型有显著差异，一个是纯合的，另一个是杂合的。这种现象称为链偏倚。在这项研究中，我们使用 Illumina 短读测序数据来评估链偏倚对基因分型质量的影响，并探讨链偏倚的可能原因。

结果

我们从 22 名患者中收集了 22 个乳腺癌样本，并使用 Illumina GAIIx 机器对其外显子进行测序。通过比较从这些测序数据推断出的基因型与从 SNP 芯片数据推断出的基因型之间的一致性，我们发现，当使用测序数据时，具有极端链偏倚的 SNP 与具有低或无链偏倚的 SNP 相比，其一致性率没有显著降低。然而，这一结果可能受到外显子测序和 SNP 芯片数据中存在的 SNP 子集的限制。我们进一步比较了低或无链偏倚 SNP 和极端链偏倚 SNP 之间的转换和颠换比以及新的非同义 SNP 数量，发现低或无链偏倚 SNP 具有更好的整体质量。我们还发现，链偏倚在这些样本的基因组位置上随机发生，并且在样本之间没有观察到一致的链偏倚位置模式。通过比较两种不同的比对器（BWA 和 Bowtie）的结果，我们发现了非常一致的链偏倚模式。因此，链偏倚不太可能是由比对伪影引起的。我们使用两种具有不同捕获方法和 Illumina 测序仪的额外独立数据集成功复制了我们的结果。

结论

极端的链偏倚表明 SNP 的假阳性率可能很高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4981/3532123/1fd07138dfda/1471-2164-13-666-1.jpg

相似文献

The effect of strand bias in Illumina short-read sequencing data.

BMC Genomics. 2012 Nov 24;13:666. doi: 10.1186/1471-2164-13-666.

Exome sequencing generates high quality data in non-target regions.

BMC Genomics. 2012 May 20;13:194. doi: 10.1186/1471-2164-13-194.

Comparison of solution-based exome capture methods for next generation sequencing.

Genome Biol. 2011 Sep 28;12(9):R94. doi: 10.1186/gb-2011-12-9-r94.

Multi-perspective quality control of Illumina exome sequencing data using QC3.

Genomics. 2014 May-Jun;103(5-6):323-8. doi: 10.1016/j.ygeno.2014.03.006. Epub 2014 Apr 3.

Improving mapping and SNP-calling performance in multiplexed targeted next-generation sequencing.

BMC Genomics. 2012 Aug 22;13:417. doi: 10.1186/1471-2164-13-417.

Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing.

Mol Genet Metab. 2013 Sep-Oct;110(1-2):65-72. doi: 10.1016/j.ymgme.2013.06.004. Epub 2013 Jun 13.

A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data.

BMC Genomics. 2015 Dec 29;16:1109. doi: 10.1186/s12864-015-2192-y.

Accurate detection and genotyping of SNPs utilizing population sequencing data.

Genome Res. 2010 Apr;20(4):537-45. doi: 10.1101/gr.100040.109. Epub 2010 Feb 11.

Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.

Genome Biol. 2011 Nov 8;12(11):R112. doi: 10.1186/gb-2011-12-11-r112.

Use of SNP chips to detect rare pathogenic variants: retrospective, population based diagnostic evaluation.

BMJ. 2021 Feb 15;372:n214. doi: 10.1136/bmj.n214.

引用本文的文献

Taming large-scale genomic analyses via sparsified genomics.

Nat Commun. 2025 Jan 21;16(1):876. doi: 10.1038/s41467-024-55762-1.

Emergence of carbapenem resistance in persistent Shewanella algae bacteremia: the role of pdsS G547W mutation in adaptive subpopulation dynamics.

Ann Clin Microbiol Antimicrob. 2024 Nov 20;23(1):102. doi: 10.1186/s12941-024-00759-3.

Genomic reproducibility in the bioinformatics era.

Genome Biol. 2024 Aug 9;25(1):213. doi: 10.1186/s13059-024-03343-2.

Improving rigor and reproducibility in chromatin immunoprecipitation assay data analysis workflows with Rocketchip.

bioRxiv. 2024 Jul 16:2024.07.10.602975. doi: 10.1101/2024.07.10.602975.

CD59 gene: 143 haplotypes of 22,718 nucleotides length by computational phasing in 113 individuals from different ethnicities.

Transfusion. 2024 Jul;64(7):1296-1305. doi: 10.1111/trf.17869. Epub 2024 May 30.

Crykey: Rapid identification of SARS-CoV-2 cryptic mutations in wastewater.

Nat Commun. 2024 May 28;15(1):4545. doi: 10.1038/s41467-024-48334-w.

Analysis of somatic mutations in whole blood from 200,618 individuals identifies pervasive positive selection and novel drivers of clonal hematopoiesis.

Nat Genet. 2024 Jun;56(6):1147-1155. doi: 10.1038/s41588-024-01755-1. Epub 2024 May 14.

High throughput AS LNA qPCR method for the detection of a specific mutation in poliovirus vaccine strains.

Vaccine. 2024 Apr 2;42(9):2475-2484. doi: 10.1016/j.vaccine.2024.01.103. Epub 2024 Mar 19.

Detecting Somatic Mutations Without Matched Normal Samples Using Long Reads.

bioRxiv. 2024 Feb 29:2024.02.26.582089. doi: 10.1101/2024.02.26.582089.

Crykey: Rapid Identification of SARS-CoV-2 Cryptic Mutations in Wastewater.

medRxiv. 2023 Nov 12:2023.06.16.23291524. doi: 10.1101/2023.06.16.23291524.

本文引用的文献

Exome sequencing generates high quality data in non-target regions.

BMC Genomics. 2012 May 20;13:194. doi: 10.1186/1471-2164-13-194.

Fast gapped-read alignment with Bowtie 2.

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

The use of next generation sequencing technology to study the effect of radiation therapy on mitochondrial DNA mutation.

Mutat Res. 2012 May 15;744(2):154-60. doi: 10.1016/j.mrgentox.2012.02.006. Epub 2012 Feb 24.

VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.

Genome Res. 2012 Mar;22(3):568-76. doi: 10.1101/gr.129684.111. Epub 2012 Feb 2.

Exome sequencing as a tool for Mendelian disease gene discovery.

Nat Rev Genet. 2011 Sep 27;12(11):745-55. doi: 10.1038/nrg3031.

Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities.

Genome Biol. 2011 Jul 25;12(7):R68. doi: 10.1186/gb-2011-12-7-r68.

A map of human genome variation from population-scale sequencing.

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Genome Res. 2010 Sep;20(9):1297-303. doi: 10.1101/gr.107524.110. Epub 2010 Jul 19.

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.

Nucleic Acids Res. 2010 Apr;38(6):1767-71. doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16.

The Sequence Alignment/Map format and SAMtools.

Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Illumina 短读测序数据中链偏向的影响。

The effect of strand bias in Illumina short-read sequencing data.

机构信息

出版信息

BACKGROUND

RESULT

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献