Suppr超能文献

双碱基编码DNA序列的局部比对。

Local alignment of two-base encoded DNA sequence.

作者信息

Homer Nils, Merriman Barry, Nelson Stanley F

机构信息

Department of Computer Science, University of California Los Angeles, Los Angeles, California 90095, USA.

出版信息

BMC Bioinformatics. 2009 Jun 9;10:175. doi: 10.1186/1471-2105-10-175.

Abstract

BACKGROUND

DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity.

RESULTS

We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions.

CONCLUSION

The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data.

摘要

背景

DNA序列比较基于使用相似性得分对两个序列进行最优局部比对。然而,一些新的DNA测序技术并不直接测量碱基序列,而是测量一种编码形式,比如这里所考虑的双碱基编码。为了将此类数据与参考序列进行比较,必须将数据解码为序列。解码是确定性的,但测量误差的可能性要求在所有可能的错误模式及由此产生的比对中进行搜索,以在较少错误与较高序列相似性之间实现最优平衡。

结果

我们提出了一种对局部比对标准动态规划方法的扩展,该方法同时对数据进行解码并执行比对,基于错误和编辑的加权组合最大化相似性得分,并允许仿射空位罚分。我们还展示了模拟结果,这些结果证明了我们的双碱基编码比对方法的性能特征,并在相同条件下将其与标准DNA序列比对进行了对比。

结论

针对双碱基编码数据的新局部比对算法在识别潜在序列变异的同时,具有强大的能力来正确检测和校正测量误差,并有助于基于这种序列数据形式的基因组重测序工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20df/2709925/9cfc17579d82/1471-2105-10-175-5.jpg

相似文献

1
Local alignment of two-base encoded DNA sequence.
BMC Bioinformatics. 2009 Jun 9;10:175. doi: 10.1186/1471-2105-10-175.
2
Local alignment of generalized k-base encoded DNA sequence.
BMC Bioinformatics. 2010 Jun 24;11:347. doi: 10.1186/1471-2105-11-347.
3
Glocal alignment: finding rearrangements during alignment.
Bioinformatics. 2003;19 Suppl 1:i54-62. doi: 10.1093/bioinformatics/btg1005.
4
The tree alignment problem.
BMC Bioinformatics. 2012 Nov 9;13:293. doi: 10.1186/1471-2105-13-293.
5
CSA: an efficient algorithm to improve circular DNA multiple alignment.
BMC Bioinformatics. 2009 Jul 23;10:230. doi: 10.1186/1471-2105-10-230.
6
FOGSAA: Fast Optimal Global Sequence Alignment Algorithm.
Sci Rep. 2013;3:1746. doi: 10.1038/srep01746.
7
Pairwise alignment of nucleotide sequences using maximal exact matches.
BMC Bioinformatics. 2019 May 21;20(1):261. doi: 10.1186/s12859-019-2827-0.
8
Iterative refinement of structure-based sequence alignments by Seed Extension.
BMC Bioinformatics. 2009 Jul 9;10:210. doi: 10.1186/1471-2105-10-210.
9
Lower bounds on multiple sequence alignment using exact 3-way alignment.
BMC Bioinformatics. 2007 Apr 30;8:140. doi: 10.1186/1471-2105-8-140.
10
Highly improved homopolymer aware nucleotide-protein alignments with 454 data.
BMC Bioinformatics. 2012 Sep 12;13:230. doi: 10.1186/1471-2105-13-230.

引用本文的文献

2
eIF2β is critical for eIF5-mediated GDP-dissociation inhibitor activity and translational control.
Nucleic Acids Res. 2016 Nov 16;44(20):9698-9709. doi: 10.1093/nar/gkw657. Epub 2016 Jul 25.
3
Transcript Abundance of Putative Lipid Phosphate Phosphatases During Development of Trypanosoma brucei in the Tsetse Fly.
Am J Trop Med Hyg. 2016 Apr;94(4):890-3. doi: 10.4269/ajtmh.15-0566. Epub 2016 Feb 8.
4
Challenges in exome analysis by LifeScope and its alternative computational pipelines.
BMC Res Notes. 2015 Sep 7;8:421. doi: 10.1186/s13104-015-1385-4.
5
Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process.
Biomed Res Int. 2015;2015:923491. doi: 10.1155/2015/923491. Epub 2015 Apr 6.
6
The Recent De Novo Origin of Protein C-Termini.
Genome Biol Evol. 2015 May 21;7(6):1686-701. doi: 10.1093/gbe/evv098.
7
Ultradeep analysis of tumor heterogeneity in regions of somatic hypermutation.
Genome Med. 2015 Mar 12;7(1):24. doi: 10.1186/s13073-015-0147-1. eCollection 2015.
8
Hoxa2 selectively enhances Meis binding to change a branchial arch ground state.
Dev Cell. 2015 Feb 9;32(3):265-77. doi: 10.1016/j.devcel.2014.12.024. Epub 2015 Jan 29.
9
Intraclonal diversity in follicular lymphoma analyzed by quantitative ultradeep sequencing of noncoding regions.
J Immunol. 2014 Nov 15;193(10):4888-94. doi: 10.4049/jimmunol.1401699. Epub 2014 Oct 13.
10
BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation.
Genome Biol. 2012 Oct 3;13(10):R82. doi: 10.1186/gb-2012-13-10-r82.

本文引用的文献

1
SHRiMP: accurate mapping of short color-space reads.
PLoS Comput Biol. 2009 May;5(5):e1000386. doi: 10.1371/journal.pcbi.1000386. Epub 2009 May 22.
2
Rapid whole-genome mutational profiling using next-generation sequencing technologies.
Genome Res. 2008 Oct;18(10):1638-42. doi: 10.1101/gr.077776.108. Epub 2008 Sep 4.
3
Mapping short DNA sequencing reads and calling variants using mapping quality scores.
Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.
4
SOAP: short oligonucleotide alignment program.
Bioinformatics. 2008 Mar 1;24(5):713-4. doi: 10.1093/bioinformatics/btn025. Epub 2008 Jan 28.
5
The diploid genome sequence of an individual human.
PLoS Biol. 2007 Sep 4;5(10):e254. doi: 10.1371/journal.pbio.0050254.
6
A general approach to the analysis of errors and failure modes in the base-calling function in automated fluorescent DNA sequencing.
Electrophoresis. 2002 Aug;23(16):2720-8. doi: 10.1002/1522-2683(200208)23:16<2720::AID-ELPS2720>3.0.CO;2-Z.
7
PatternHunter: faster and more sensitive homology search.
Bioinformatics. 2002 Mar;18(3):440-5. doi: 10.1093/bioinformatics/18.3.440.
8
BLAT--the BLAST-like alignment tool.
Genome Res. 2002 Apr;12(4):656-64. doi: 10.1101/gr.229202.
9
SSAHA: a fast search method for large DNA databases.
Genome Res. 2001 Oct;11(10):1725-9. doi: 10.1101/gr.194201.
10
Improvement of base-calling in multilane automated DNA sequencing by use of electrophoretic calibration standards, data linearization, and trace alignment.
Electrophoresis. 2001 Jun;22(10):1906-14. doi: 10.1002/1522-2683(200106)22:10<1906::AID-ELPS1906>3.0.CO;2-5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验