• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用残基保守性和共进化进行生物序列比对。

Aligning biological sequences by exploiting residue conservation and coevolution.

机构信息

Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.

Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France.

出版信息

Phys Rev E. 2020 Dec;102(6-1):062409. doi: 10.1103/PhysRevE.102.062409.

DOI:10.1103/PhysRevE.102.062409
PMID:33465950
Abstract

Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e., arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position specificities like conservation in sequences but assume an independent evolution of different positions. Over recent years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles, and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.

摘要

核苷酸(用于 DNA 和 RNA)或氨基酸(用于蛋白质)序列是生物学中的核心对象。在最重要的计算问题中,序列比对问题尤为突出,即通过某种方式排列来自不同生物体的序列,以识别相似区域,检测序列之间的进化关系,并预测生物分子的结构和功能。这通常通过轮廓模型来解决,该模型可以捕获序列中特定位置的保守性,但假定不同位置的独立进化。近年来,已经充分证明不同氨基酸位置的共进化对于维持三维结构和功能至关重要。基于逆统计物理学的建模方法可以捕捉序列集合中的共进化信号,并且现在广泛用于预测蛋白质结构、蛋白质-蛋白质相互作用和突变景观。在这里,我们提出了 DCAlign,这是一种基于近似消息传递策略的高效对齐算法,它能够克服轮廓模型的局限性,以通用的方式包括位置之间的共进化,因此无需使用互补结构信息即可普遍适用于蛋白质和 RNA 序列对齐。我们使用精心控制的模拟数据以及真实的蛋白质和 RNA 序列仔细探索了 DCAlign 的潜力。

相似文献

1
Aligning biological sequences by exploiting residue conservation and coevolution.利用残基保守性和共进化进行生物序列比对。
Phys Rev E. 2020 Dec;102(6-1):062409. doi: 10.1103/PhysRevE.102.062409.
2
Sequence coevolution between RNA and protein characterized by mutual information between residue triplets.基于残基三联体之间互信息的 RNA 和蛋白质序列共进化特征。
PLoS One. 2012;7(1):e30022. doi: 10.1371/journal.pone.0030022. Epub 2012 Jan 18.
3
Protein-protein interactions leave evolutionary footprints: High molecular coevolution at the core of interfaces.蛋白质-蛋白质相互作用留下了进化印记:界面核心处的高分子协同进化。
Protein Sci. 2017 Dec;26(12):2438-2444. doi: 10.1002/pro.3318. Epub 2017 Oct 25.
4
Accurate simulation and detection of coevolution signals in multiple sequence alignments.准确模拟和检测多重序列比对中的协同进化信号。
PLoS One. 2012;7(10):e47108. doi: 10.1371/journal.pone.0047108. Epub 2012 Oct 16.
5
AL2CO: calculation of positional conservation in a protein sequence alignment.AL2CO:蛋白质序列比对中位置保守性的计算
Bioinformatics. 2001 Aug;17(8):700-12. doi: 10.1093/bioinformatics/17.8.700.
6
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。
J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.
7
Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction.解析进化信号:保守性、特异性决定位置和共进化。对催化残基预测的启示。
BMC Bioinformatics. 2012 Sep 14;13:235. doi: 10.1186/1471-2105-13-235.
8
Small-coupling expansion for multiple sequence alignment.小耦合展开在多序列比对中的应用。
Phys Rev E. 2023 Apr;107(4-1):044125. doi: 10.1103/PhysRevE.107.044125.
9
Mutagenesis Objective Search and Selection Tool (MOSST): an algorithm to predict structure-function related mutations in proteins.基因突变目标搜索和选择工具(MOSST):一种预测蛋白质结构-功能相关突变的算法。
BMC Bioinformatics. 2011 Apr 27;12:122. doi: 10.1186/1471-2105-12-122.
10
Size and structure of the sequence space of repeat proteins.重复蛋白序列空间的大小和结构。
PLoS Comput Biol. 2019 Aug 15;15(8):e1007282. doi: 10.1371/journal.pcbi.1007282. eCollection 2019 Aug.

引用本文的文献

1
Harnessing deep learning for proteome-scale detection of amyloid signaling motifs.利用深度学习进行蛋白质组规模的淀粉样信号基序检测。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i420-i428. doi: 10.1093/bioinformatics/btaf200.
2
DCAlign v1.0: aligning biological sequences using co-evolution models and informed priors.DCAlign v1.0:使用共进化模型和信息先验对齐生物序列。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad537.
3
Exploring a diverse world of effector domains and amyloid signaling motifs in fungal NLR proteins.
探索真菌 NLR 蛋白中效应结构域和淀粉样信号基序的多样化世界。
PLoS Comput Biol. 2022 Dec 21;18(12):e1010787. doi: 10.1371/journal.pcbi.1010787. eCollection 2022 Dec.
4
End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman.基于可微分 Smith-Waterman 的多序列比对端到端学习。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac724.
5
Constructing benchmark test sets for biological sequence analysis using independent set algorithms.使用独立集算法构建生物序列分析的基准测试集。
PLoS Comput Biol. 2022 Mar 7;18(3):e1009492. doi: 10.1371/journal.pcbi.1009492. eCollection 2022 Mar.
6
PPalign: optimal alignment of Potts models representing proteins with direct coupling information.PPalign:具有直接耦合信息的 Potts 模型代表蛋白质的最佳对齐。
BMC Bioinformatics. 2021 Jun 10;22(1):317. doi: 10.1186/s12859-021-04222-4.
7
Searching for universal model of amyloid signaling motifs using probabilistic context-free grammars.使用概率上下文无关语法搜索淀粉样蛋白信号基序的通用模型。
BMC Bioinformatics. 2021 Apr 29;22(1):222. doi: 10.1186/s12859-021-04139-y.
8
Remote homology search with hidden Potts models.使用隐式 Potts 模型进行远程同源搜索。
PLoS Comput Biol. 2020 Nov 30;16(11):e1008085. doi: 10.1371/journal.pcbi.1008085. eCollection 2020 Nov.