Suppr超能文献

选择BLAST选项以更好地将直系同源物检测为相互最佳匹配。

Choosing BLAST options for better detection of orthologs as reciprocal best hits.

作者信息

Moreno-Hagelsieb Gabriel, Latimer Kristen

机构信息

Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo, ON, Canada, N2L 3C5.

出版信息

Bioinformatics. 2008 Feb 1;24(3):319-24. doi: 10.1093/bioinformatics/btm585. Epub 2007 Nov 26.

Abstract

MOTIVATION

The analyses of the increasing number of genome sequences requires shortcuts for the detection of orthologs, such as Reciprocal Best Hits (RBH), where orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. Two BLAST options seem to affect alignment scores the most, and thus the choice of a best hit: the filtering of low information sequence segments and the algorithm used to produce the final alignment. Thus, we decided to test whether such options would help better detect orthologs.

RESULTS

Using Escherichia coli K12 as an example, we compared the number and quality of orthologs detected as RBH. We tested four different conditions derived from two options: filtering of low-information segments, hard (default) versus soft; and alignment algorithm, default (based on matching words) versus Smith-Waterman. All options resulted in significant differences in the number of orthologs detected, with the highest numbers obtained with the combination of soft filtering with Smith-Waterman alignments. We compared these results with those of Reciprocal Shortest Distances (RSD), supposed to be superior to RBH because it uses an evolutionary measure of distance, rather than BLAST statistics, to rank homologs and thus detect orthologs. RSD barely increased the number of orthologs detected over those found with RBH. Error estimates, based on analyses of conservation of gene order, found small differences in the quality of orthologs detected using RBH. However, RSD showed the highest error rates. Thus, RSD have no advantages over RBH.

AVAILABILITY

Orthologs detected as Reciprocal Best Hits using soft masking and Smith-Waterman alignments can be downloaded from http://popolvuh.wlu.ca/Orthologs.

摘要

动机

对越来越多的基因组序列进行分析需要快捷的直系同源基因检测方法,比如相互最佳比对(RBH),即如果来自不同基因组的两个基因在彼此基因组中均找到对方作为最佳比对结果,那么这两个基因就被假定为直系同源基因。有两个BLAST选项似乎对比对得分影响最大,进而影响最佳比对结果的选择:低信息序列片段的过滤以及用于生成最终比对结果的算法。因此,我们决定测试这些选项是否有助于更好地检测直系同源基因。

结果

以大肠杆菌K12为例,我们比较了作为RBH检测到的直系同源基因的数量和质量。我们测试了由两个选项衍生出的四种不同条件:低信息片段的过滤,严格(默认)与宽松;以及比对算法,默认(基于匹配词)与史密斯-沃特曼算法。所有选项在检测到的直系同源基因数量上均产生了显著差异,其中宽松过滤与史密斯-沃特曼比对相结合时获得的数量最多。我们将这些结果与相互最短距离法(RSD)的结果进行了比较,RSD被认为优于RBH,因为它使用进化距离度量而非BLAST统计量来对同源基因进行排名,从而检测直系同源基因。RSD检测到的直系同源基因数量相比RBH几乎没有增加。基于基因顺序保守性分析的误差估计发现,使用RBH检测到的直系同源基因在质量上存在微小差异。然而,RSD显示出最高的错误率。因此,RSD相对于RBH没有优势。

可用性

使用软屏蔽和史密斯-沃特曼比对作为相互最佳比对检测到的直系同源基因可从http://popolvuh.wlu.ca/Orthologs下载。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验