利用 SIMSApiper 进行大规模结构信息指导的蛋白质多重序列比对。

Large-scale structure-informed multiple sequence alignment of proteins with SIMSApiper.

机构信息

Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium.

Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium.

出版信息

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae276.

DOI:10.1093/bioinformatics/btae276

PMID:38648741

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11099654/

Abstract

SUMMARY

SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity-based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements.

AVAILABILITY AND IMPLEMENTATION

The pipeline is implemented using Nextflow, Python3, and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.

摘要

SIMSApiper 是一个 Nextflow 管道，它比标准的基于结构的对齐方法更快地为数千个蛋白质序列创建可靠的、结构信息丰富的 MSAs。结构信息可以由用户提供，也可以由管道从在线资源中收集。可以通过基于序列同一性的子集进行并行化，从而显著加快对齐过程。最后，可以利用保守的二级结构元素的位置来减少最终对齐中的空位数量。

可用性和实现

该管道使用 Nextflow、Python3 和 Bash 实现。它在 github.com/Bio2Byte/simsapiper 上公开可用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d6f/11099654/fced1dfe1794/btae276f1.jpg

相似文献

Large-scale structure-informed multiple sequence alignment of proteins with SIMSApiper.利用 SIMSApiper 进行大规模结构信息指导的蛋白质多重序列比对。

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae276.

Parallelization of MAFFT for large-scale multiple sequence alignments.并行化 MAFFT 进行大规模多序列比对。

Bioinformatics. 2018 Jul 15;34(14):2490-2492. doi: 10.1093/bioinformatics/bty121.

Protein multiple sequence alignment benchmarking through secondary structure prediction.通过二级结构预测进行蛋白质多序列比对基准测试。

Bioinformatics. 2017 May 1;33(9):1331-1337. doi: 10.1093/bioinformatics/btw840.

A computational pipeline for protein structure prediction and analysis at genome scale.一种用于基因组规模蛋白质结构预测与分析的计算流程。

Bioinformatics. 2003 Oct 12;19(15):1985-96. doi: 10.1093/bioinformatics/btg262.

Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail：用于全局、半全局和局部成对序列比对的SIMD C库。

BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.

Fast multiple sequence alignment via multi-armed bandits.基于多臂老虎机的快速多重序列比对。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i328-i336. doi: 10.1093/bioinformatics/btae225.

BioShell--a package of tools for structural biology computations.BioShell——一套用于结构生物学计算的工具包。

Bioinformatics. 2006 Mar 1;22(5):621-2. doi: 10.1093/bioinformatics/btk037. Epub 2006 Jan 10.

Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.快速检测、分类和精确比对多达上百万条甚至更多的相关蛋白质序列。

Bioinformatics. 2009 Aug 1;25(15):1869-75. doi: 10.1093/bioinformatics/btp342. Epub 2009 Jun 8.

AQUA: automated quality improvement for multiple sequence alignments.AQUA：多序列比对的自动化质量改进。

Bioinformatics. 2010 Jan 15;26(2):263-5. doi: 10.1093/bioinformatics/btp651. Epub 2009 Nov 19.

DOMAC: an accurate, hybrid protein domain prediction server.DOMAC：一个准确的混合蛋白质结构域预测服务器。

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W354-6. doi: 10.1093/nar/gkm390. Epub 2007 Jun 6.

本文引用的文献

Clustering predicted structures at the scale of the known protein universe.对已知蛋白质宇宙尺度的预测结构进行聚类。

Nature. 2023 Oct;622(7983):637-645. doi: 10.1038/s41586-023-06510-w. Epub 2023 Sep 13.

Critical assessment of protein intrinsic disorder prediction (CAID) - Results of round 2.蛋白质固有无序预测（CAID）的批判性评估——第 2 轮结果。

Proteins. 2023 Dec;91(12):1925-1934. doi: 10.1002/prot.26582. Epub 2023 Aug 25.

Towards the accurate alignment of over a million protein sequences: Current state of the art.实现超过一百万条蛋白质序列的精确比对：当前的技术水平。

Curr Opin Struct Biol. 2023 Jun;80:102577. doi: 10.1016/j.sbi.2023.102577. Epub 2023 Apr 1.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Sequence and structure alignments in post-AlphaFold era.后 AlphaFold 时代的序列和结构比对。

Curr Opin Struct Biol. 2023 Apr;79:102539. doi: 10.1016/j.sbi.2023.102539. Epub 2023 Feb 6.

Deciphering the RRM-RNA recognition code: A computational analysis.解析 RRM-RNA 识别码：计算分析。

PLoS Comput Biol. 2023 Jan 23;19(1):e1010859. doi: 10.1371/journal.pcbi.1010859. eCollection 2023 Jan.

UniProt: the Universal Protein Knowledgebase in 2023.UniProt：2023 年的通用蛋白质知识库。

Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.

InterPro in 2022.InterPro 在 2022 年。

Nucleic Acids Res. 2023 Jan 6;51(D1):D418-D427. doi: 10.1093/nar/gkac993.

Highly significant improvement of protein sequence alignments with AlphaFold2.使用 AlphaFold2 大幅提高蛋白质序列比对的精确度。

Bioinformatics. 2022 Nov 15;38(22):5007-5011. doi: 10.1093/bioinformatics/btac625.

ColabFold: making protein folding accessible to all.ColabFold：让蛋白质折叠变得人人可用。

Nat Methods. 2022 Jun;19(6):679-682. doi: 10.1038/s41592-022-01488-1. Epub 2022 May 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用 SIMSApiper 进行大规模结构信息指导的蛋白质多重序列比对。

Large-scale structure-informed multiple sequence alignment of proteins with SIMSApiper.

机构信息

出版信息

SUMMARY

AVAILABILITY AND IMPLEMENTATION

摘要

可用性和实现

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献