一种基于重叠翻译核酸的新型序列相似性搜索与可视化方法：blastNP。

A novel sequence similarity searching and visualization method based on overlappingly translated nucleic acids: the blastNP.

作者信息

Biro Jan C, Biro Josephine M K

机构信息

Homulus Informatics, 88 Howard, # 1205, San Francisco, CA 94105, USA.

出版信息

Med Hypotheses. 2004;62(4):568-74. doi: 10.1016/j.mehy.2003.11.020.

DOI:10.1016/j.mehy.2003.11.020

PMID:15050109

Abstract

Sequence data are stored in nucleic acid and protein databases. Searching the nucleic acid databases is very specific but rather insensitive method. Searching protein databases is sensitive but not very specific procedure. It was expected that the combination of these methods might provide an optimal approach. Therefore an alternative method to TblastX has been developed, known as blastNP. Nucleic acids in database and query sequences were translated into overlapping protein-like sequences (overlappingly translated sequences or OTSs) before searching with blastP. Thus, each nucleic acid sequence is represented by a single "protein like" sequence (instead of three hypothetical proteins in different reading frames). The blastNP method is defined as a blastP that is performed on an overlappingly translated nucleic acid database using a similarly converted nucleic acid query. The specificity and sensitivity of blastNP and TblastX is very similar, however blastNP is more sensitive to detect short sequence similarities (less than 50 residues). BlastNP combines the advantages of nucleotide and protein blasts and bypasses many difficulties: (1). it is more sensitive to weak sequence similarities than blastN, (2). codon redundancy is eliminated, (3). the sensitivity to single nucleotide polymorphism, mutation and sequencing errors are reduced, (4). it is insensitive to frame shifts. This novel method was proved to find significant sequence similarities which remained hidden for other methods and is a promising tool for further understanding (and annotating) the function of many old and new sequences.

摘要

序列数据存储在核酸和蛋白质数据库中。搜索核酸数据库是一种非常特异但不太灵敏的方法。搜索蛋白质数据库是灵敏但不太特异的过程。预计这两种方法的结合可能会提供一种最优方法。因此，已开发出一种替代TblastX的方法，称为blastNP。在使用blastP进行搜索之前，将数据库中的核酸和查询序列翻译成重叠的类蛋白质序列（重叠翻译序列或OTS）。因此，每个核酸序列由单个“类蛋白质”序列表示（而不是不同阅读框中的三个假设蛋白质）。blastNP方法定义为使用类似转换的核酸查询在重叠翻译的核酸数据库上执行的blastP。blastNP和TblastX的特异性和灵敏性非常相似，然而blastNP在检测短序列相似性（少于50个残基）方面更灵敏。blastNP结合了核苷酸和蛋白质比对的优点并规避了许多困难：（1）. 它比blastN对弱序列相似性更灵敏，（2）. 消除了密码子冗余，（3）. 降低了对单核苷酸多态性、突变和测序错误的灵敏性，（4）. 对移码不敏感。这种新方法被证明能找到其他方法未发现的显著序列相似性，是进一步理解（和注释）许多新老序列功能的有前途的工具。