maxAlike：基于最大似然的序列重建，应用于改进未知序列的引物设计。

maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences.

机构信息

Center for non-coding RNA in Technology and Health, IBHV, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark.

出版信息

Bioinformatics. 2011 Feb 1;27(3):317-25. doi: 10.1093/bioinformatics/btq651. Epub 2010 Dec 1.

DOI:10.1093/bioinformatics/btq651

PMID:21123221

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3031029/

Abstract

MOTIVATION

The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced genomic regions and the design of oligonucleotide primers for target genes in species with lacking sequence information but also the preparation of customized queries for homology searches.

RESULTS

We introduce the maxAlike algorithm, which reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level. For 37 out of 44 target species in a test dataset, we obtain a significant increase of the reconstruction accuracy compared to both the consensus sequence from the alignment and the sequence of the nearest phylogenetic neighbor. When considering only nucleotides above a confidence limit, maxAlike is significantly better (up to 10%) in all 44 species. The improved sequence reconstruction also leads to an increase of the quality of PCR primer design for yet unsequenced genes: the differences between the expected T(m) and real T(m) of the primer-template duplex can be reduced by ~26% compared with other reconstruction approaches. We also show that the prediction accuracy is robust to common distortions of the input trees. The prediction accuracy drops by only 1% on average across all species for 77% of trees derived from random genomic loci in a test dataset.

AVAILABILITY

maxAlike is available for download and web server at: http://rth.dk/resources/maxAlike.

摘要

动机

随着高通量测序技术的快速发展及其局限性，从特定物种重建基因组序列的任务变得越来越重要。应用不仅包括补偿未测序基因组区域的缺失数据和设计缺乏序列信息的物种的目标基因的寡核苷酸引物，还包括为同源性搜索准备定制查询。

结果

我们介绍了 maxAlike 算法，该算法基于其他物种中的序列同源物为特定分类单元重建基因组序列。输入是一个多序列比对和一个包含目标物种的系统发育树。对于该目标物种，该算法计算每个序列位置的核苷酸概率。然后根据一定的置信水平重建共识序列。在测试数据集的 44 个目标物种中的 37 个中，与比对的共识序列和最近的系统发育邻居的序列相比，我们获得了重建准确性的显著提高。当仅考虑置信度以上的核苷酸时，在所有 44 个物种中，maxAlike 的表现明显更好（高达 10%）。改进的序列重建还提高了尚未测序基因的 PCR 引物设计质量：与其他重建方法相比，预期 T(m)和引物模板双链体的实际 T(m)之间的差异可以减少约 26%。我们还表明，预测准确性对输入树的常见扭曲具有鲁棒性。在测试数据集的随机基因组位点中，有 77%的树来自随机基因组位点，其预测准确性平均下降 1%。

可用性

maxAlike 可在以下网址下载和使用网络服务器：http://rth.dk/resources/maxAlike。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf93/3031029/225d5176a742/btq651f1.jpg

相似文献

maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences.

Bioinformatics. 2011 Feb 1;27(3):317-25. doi: 10.1093/bioinformatics/btq651. Epub 2010 Dec 1.

BatchPrimer3: a high throughput web application for PCR and sequencing primer design.

BMC Bioinformatics. 2008 May 29;9:253. doi: 10.1186/1471-2105-9-253.

Primaclade--a flexible tool to find conserved PCR primers across multiple species.

Bioinformatics. 2005 Apr 1;21(7):1263-4. doi: 10.1093/bioinformatics/bti134. Epub 2004 Nov 11.

PUNS: transcriptomic- and genomic-in silico PCR for enhanced primer design.

Bioinformatics. 2004 Oct 12;20(15):2399-400. doi: 10.1093/bioinformatics/bth257. Epub 2004 Apr 8.

AutoDimer: a screening tool for primer-dimer and hairpin structures.

Biotechniques. 2004 Aug;37(2):226-31. doi: 10.2144/04372ST03.

RExPrimer: an integrated primer designing tool increases PCR effectiveness by avoiding 3' SNP-in-primer and mis-priming from structural variation.

BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2164-10-S3-S4.

PerlPrimer: cross-platform, graphical primer design for standard, bisulphite and real-time PCR.

Bioinformatics. 2004 Oct 12;20(15):2471-2. doi: 10.1093/bioinformatics/bth254. Epub 2004 Apr 8.

Dual-genome primer design for construction of DNA microarrays.

Bioinformatics. 2005 Feb 1;21(3):325-32. doi: 10.1093/bioinformatics/bti001. Epub 2004 Aug 27.

AutoCloner: automatic homologue-specific primer design for full-gene cloning in polyploids.

BMC Bioinformatics. 2020 Jul 16;21(1):311. doi: 10.1186/s12859-020-03601-7.

Novel computational methods for increasing PCR primer design effectiveness in directed sequencing.

BMC Bioinformatics. 2008 Apr 11;9:191. doi: 10.1186/1471-2105-9-191.

引用本文的文献

Comparative RNA Genomics.

Methods Mol Biol. 2024;2802:347-393. doi: 10.1007/978-1-0716-3838-5_12.

Evolution and Phylogeny of MicroRNAs - Protocols, Pitfalls, and Problems.

Methods Mol Biol. 2022;2257:211-233. doi: 10.1007/978-1-0716-1170-8_11.

phoD Alkaline Phosphatase Gene Diversity in Soil.

Appl Environ Microbiol. 2015 Oct;81(20):7281-9. doi: 10.1128/AEM.01823-15. Epub 2015 Aug 7.

One origin for metallo-β-lactamase activity, or two? An investigation assessing a diverse set of reconstructed ancestral sequences based on a sample of phylogenetic trees.

J Mol Evol. 2014 Oct;79(3-4):117-29. doi: 10.1007/s00239-014-9639-7. Epub 2014 Sep 4.

FastML: a web server for probabilistic reconstruction of ancestral sequences.

Nucleic Acids Res. 2012 Jul;40(Web Server issue):W580-4. doi: 10.1093/nar/gks498. Epub 2012 May 31.

本文引用的文献

Optimal selection of gene and ingroup taxon sampling for resolving phylogenetic relationships.

Syst Biol. 2010 Jul;59(4):446-57. doi: 10.1093/sysbio/syq025. Epub 2010 May 19.

FastTree 2--approximately maximum-likelihood trees for large alignments.

PLoS One. 2010 Mar 10;5(3):e9490. doi: 10.1371/journal.pone.0009490.

Ancestors 1.0: a web server for ancestral sequence reconstruction.

Bioinformatics. 2010 Jan 1;26(1):130-1. doi: 10.1093/bioinformatics/btp600. Epub 2009 Oct 22.

primers4clades: a web server that uses phylogenetic trees to design lineage-specific PCR primers for metagenomic and diversity studies.

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W95-W100. doi: 10.1093/nar/gkp377. Epub 2009 May 21.

UniPrime2: a web service providing easier Universal Primer design.

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W209-13. doi: 10.1093/nar/gkp269. Epub 2009 Apr 28.

Genome-wide nucleotide-level mammalian ancestor reconstruction.

Genome Res. 2008 Nov;18(11):1829-43. doi: 10.1101/gr.076521.108. Epub 2008 Oct 10.

Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions.

Genome Res. 2008 Feb;18(2):242-51. doi: 10.1101/gr.6887408. Epub 2007 Dec 20.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Nature. 2007 Jun 14;447(7146):799-816. doi: 10.1038/nature05874.

PAML 4: phylogenetic analysis by maximum likelihood.

Mol Biol Evol. 2007 Aug;24(8):1586-91. doi: 10.1093/molbev/msm088. Epub 2007 May 4.

Greene SCPrimer: a rapid comprehensive tool for designing degenerate primers from multiple sequence alignments.

Nucleic Acids Res. 2006;34(22):6605-11. doi: 10.1093/nar/gkl966. Epub 2006 Nov 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

maxAlike：基于最大似然的序列重建，应用于改进未知序列的引物设计。

maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences.

机构信息

Center for non-coding RNA in Technology and Health, IBHV, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark.