Suppr超能文献

整合分类阶元与分类内变异性的 DNA 条码序列鉴定。

DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.

机构信息

Lewis B. and Dorothy Cullman Program for Molecular Systematics, The New York Botanical Garden, Bronx, New York, United States of America.

出版信息

PLoS One. 2011;6(8):e20552. doi: 10.1371/journal.pone.0020552. Epub 2011 Aug 16.

Abstract

For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple-sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple-sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment-free sequence identification algorithm--BRONX--that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple-sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user-defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini-barcode queries against a full-length barcode database). BRONX consistently produced better identifications at the genus-level for all query types.

摘要

为了使 DNA 条形码技术在科学研究中取得成功,需要一种准确、快速的查询序列识别方法。虽然可以为某些条形码标记(如 COI、rbcL)生成全局多序列比对,但并非所有条形码标记都具有相同的结构保守性(如 matK)。因此,依赖全局多序列比对的算法并不普遍适用。一些使用局部比对的序列识别方法(如 BLAST)无法准确区分高度相似的序列,也无法应对层次系统发育关系或分类群内的变异性。在这里,我提出了一种新颖的无比对序列识别算法——BRONX,它可以考虑到分类群内的变异性和分类群之间的层次关系。BRONX 识别参考序列中的短变异片段和相应的不变侧翼区域。这些侧翼区域用于在不生成全局多序列比对的情况下对查询序列中的可变区域进行评分。通过将分类群内的变异性纳入评分过程,可以最大限度地减少由于共享等位基因/单倍型引起的错误识别。对更具包容性终端的明确处理允许对每个分类水平进行单独识别,或者对用户定义的终端进行单独识别。当查询和参考序列之间存在不完全重叠时(例如,针对全长条形码数据库的迷你条形码查询),BRONX 的性能优于所有其他方法。BRONX 始终能够为所有查询类型在属级水平产生更好的识别结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be39/3156709/281b8bc8fe1a/pone.0020552.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验