Suppr超能文献

将MAFFT序列比对程序应用于对链式引导树实用性的大数据重新检验。

Application of the MAFFT sequence alignment program to large data-reexamination of the usefulness of chained guide trees.

作者信息

Yamada Kazunori D, Tomii Kentaro, Katoh Kazutaka

机构信息

Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan.

Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan.

出版信息

Bioinformatics. 2016 Nov 1;32(21):3246-3251. doi: 10.1093/bioinformatics/btw412. Epub 2016 Jul 4.

Abstract

MOTIVATION

Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones.

RESULTS

We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use.

AVAILABILITY AND IMPLEMENTATION

http://mafft.cbrc.jp/alignment/software/ CONTACT: katoh@ifrec.osaka-u.ac.jpSupplementary information: Supplementary data are available at Bioinformatics online.

摘要

动机

由于测序技术的进步,由数千个序列组成的大型多序列比对(MSA)越来越普遍。MAFFT MSA程序有多种构建大型MSA的选项,但由于对大型MSA进行实际基准测试很困难,其性能尚未得到充分评估。最近,通过HomFam和ContTest基准蛋白质数据集使得此类评估成为可能。随着这些数据集的发展,提出了一个有趣的理论:链式引导树可提高结构保守区域MSA的准确性。该理论挑战了渐进比对方法的基础,需要通过与其他已知方法(包括计算量较大的方法)进行比较来检验。

结果

我们使用HomFam、ContTest和OXFam(OXBench的扩展版本)来评估MAFFT中启用的几种方法:(1)使用近似引导树的渐进方法,(2)使用链式引导树的渐进方法,(3)迭代优化方法和渐进方法的组合,以及(4)使用严格引导树和一致性得分的近似程度较低的渐进方法。用于大型MSA的其他程序Clustal Omega和UPP也被纳入比较。方法2(链式引导树)在ContTest中效果为正,但在HomFam和OXFam中为负。对于这三个数据集,方法3和4比方法2更一致地提高了基准分数,表明它们使用起来更安全。

可用性和实现方式

http://mafft.cbrc.jp/alignment/software/ 联系方式:katoh@ifrec.osaka-u.ac.jp 补充信息:补充数据可在《生物信息学》在线获取。

相似文献

5
Making automated multiple alignments of very large numbers of protein sequences.对大量蛋白质序列进行自动多重比对。
Bioinformatics. 2013 Apr 15;29(8):989-95. doi: 10.1093/bioinformatics/btt093. Epub 2013 Feb 21.

引用本文的文献

本文引用的文献

3
10

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验