• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于匹配部分标记序列图的算法

Algorithms for matching partially labelled sequence graphs.

作者信息

Taylor William R

机构信息

Francis Crick Institute, 1 Midland Road, London, NW1 1AT UK.

出版信息

Algorithms Mol Biol. 2017 Sep 25;12:24. doi: 10.1186/s13015-017-0115-y. eCollection 2017.

DOI:10.1186/s13015-017-0115-y
PMID:29021818
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5613400/
Abstract

BACKGROUND

In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there are multiple related sequences (paralogs) in each species then the pairing is more difficult. In bacteria a good guide can be gained from genome co-location as interacting proteins tend to be in a common operon but in eukaryotes this simple principle is not sufficient.

RESULTS

The methods developed in this paper take sets of paralogs for different proteins found in the same species and make a pairing based on their evolutionary distance relative to a set of other proteins that are unique and so have a known relationship (singletons). The former constitute a set of unlabelled nodes in a graph while the latter are labelled. Two variants were tested, one based on a phylogenetic tree of the sequences (the topology-based method) and a simpler, faster variant based only on the inter-sequence distances (the distance-based method). Over a set of test proteins, both gave good results, with the topology method performing slightly better.

CONCLUSIONS

The methods develop here still need refinement and augmentation from constraints other than the sequence data alone, such as known interactions from annotation and databases, or non-trivial relationships in genome location. With the ever growing numbers of eukaryotic genomes, it is hoped that the methods described here will open a route to the use of these data equal to the current success attained with bacterial sequences.

摘要

背景

为了找到蛋白质之间的相关位置对,这对预测相互作用很有用,有必要连接两个大型多序列比对,使得连接在一起的序列属于在其原始物种中相互作用的那些序列。当每个蛋白质都是独特的时,物种名称足以指导这种匹配,然而,当每个物种中有多个相关序列(旁系同源物)时,配对就更加困难。在细菌中,可以从基因组共定位获得很好的指导,因为相互作用的蛋白质往往存在于同一个操纵子中,但在真核生物中,这个简单的原则是不够的。

结果

本文开发的方法采用在同一物种中发现的不同蛋白质的旁系同源物集,并根据它们相对于一组独特的、因此具有已知关系的其他蛋白质(单拷贝基因)的进化距离进行配对。前者在图中构成一组未标记的节点,而后者是有标记的。测试了两种变体,一种基于序列的系统发育树(基于拓扑的方法),另一种更简单、更快的变体仅基于序列间距离(基于距离的方法)。在一组测试蛋白质上,两种方法都取得了很好的结果,拓扑方法的表现略好一些。

结论

这里开发的方法仍然需要从仅序列数据之外的其他约束条件进行完善和扩充,例如来自注释和数据库的已知相互作用,或基因组位置中的非平凡关系。随着真核生物基因组数量的不断增加,希望这里描述的方法将开辟一条利用这些数据的途径,其效果能与目前细菌序列所取得的成功相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/0b6c44da893d/13015_2017_115_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/e3e3f92c7c6c/13015_2017_115_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/252aab4d5acd/13015_2017_115_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/6757602254dc/13015_2017_115_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/6dcd69a79243/13015_2017_115_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/a5132e7cdee7/13015_2017_115_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/3134a3458638/13015_2017_115_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/7226bbb0bdc2/13015_2017_115_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/0f8cd3c04dfd/13015_2017_115_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/21eb33581aa0/13015_2017_115_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/2953c1ba75ea/13015_2017_115_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/bb1086287802/13015_2017_115_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/0b6c44da893d/13015_2017_115_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/e3e3f92c7c6c/13015_2017_115_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/252aab4d5acd/13015_2017_115_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/6757602254dc/13015_2017_115_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/6dcd69a79243/13015_2017_115_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/a5132e7cdee7/13015_2017_115_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/3134a3458638/13015_2017_115_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/7226bbb0bdc2/13015_2017_115_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/0f8cd3c04dfd/13015_2017_115_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/21eb33581aa0/13015_2017_115_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/2953c1ba75ea/13015_2017_115_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/bb1086287802/13015_2017_115_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c88/5613400/0b6c44da893d/13015_2017_115_Fig12_HTML.jpg

相似文献

1
Algorithms for matching partially labelled sequence graphs.用于匹配部分标记序列图的算法
Algorithms Mol Biol. 2017 Sep 25;12:24. doi: 10.1186/s13015-017-0115-y. eCollection 2017.
2
Evolutionary distances between nucleotide sequences based on the distribution of substitution rates among sites as estimated by parsimony.基于简约法估计的位点间替换率分布的核苷酸序列间的进化距离。
Mol Biol Evol. 1997 Mar;14(3):287-98. doi: 10.1093/oxfordjournals.molbev.a025764.
3
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.通过成对物种比较对直系同源基因和旁系同源基因进行自动聚类。
J Mol Biol. 2001 Dec 14;314(5):1041-52. doi: 10.1006/jmbi.2000.5197.
4
Distance indexing and seed clustering in sequence graphs.序列图中的距离索引和种子聚类。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i146-i153. doi: 10.1093/bioinformatics/btaa446.
5
Phylogenetic correlations can suffice to infer protein partners from sequences.系统发育相关性足以从序列中推断蛋白质伴侣。
PLoS Comput Biol. 2019 Oct 14;15(10):e1007179. doi: 10.1371/journal.pcbi.1007179. eCollection 2019 Oct.
6
Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes.用于计算基因组进化简约进化情景、最后共同祖先以及原核生物进化中水平基因转移主导地位的算法。
BMC Evol Biol. 2003 Jan 6;3:2. doi: 10.1186/1471-2148-3-2.
7
Using CLUSTAL for multiple sequence alignments.使用CLUSTAL进行多序列比对。
Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8.
8
Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences.基于整个质体和整个线粒体基因组序列推断的基因组BLAST距离系统发育树。
BMC Bioinformatics. 2006 Jul 19;7:350. doi: 10.1186/1471-2105-7-350.
9
Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases.系统发育树中的树形模式匹配:在同源基因序列数据库中自动搜索直系同源基因或旁系同源基因。
Bioinformatics. 2005 Jun 1;21(11):2596-603. doi: 10.1093/bioinformatics/bti325. Epub 2005 Feb 15.
10
Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm.使用一种基于系统发育感知图算法的多重序列比对精确扩展方法。
Bioinformatics. 2012 Jul 1;28(13):1684-91. doi: 10.1093/bioinformatics/bts198. Epub 2012 Apr 23.

本文引用的文献

1
Exploring RNA conformational space under sparse distance restraints.探索稀疏距离约束下的 RNA 构象空间。
Sci Rep. 2017 Mar 10;7:44074. doi: 10.1038/srep44074.
2
Molecular Models for the Core Components of the Flagellar Type-III Secretion Complex.鞭毛III型分泌复合体核心组件的分子模型
PLoS One. 2016 Nov 17;11(11):e0164047. doi: 10.1371/journal.pone.0164047. eCollection 2016.
3
Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis.通过直接耦合分析同时鉴定特异性相互作用的旁系同源物和蛋白质间相互作用位点
Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12186-12191. doi: 10.1073/pnas.1607570113. Epub 2016 Oct 11.
4
Inferring interaction partners from protein sequences.从蛋白质序列推断相互作用伙伴。
Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12180-12185. doi: 10.1073/pnas.1606762113. Epub 2016 Sep 23.
5
CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations.CCMpred--快速准确地预测蛋白质残基-残基接触的相关突变。
Bioinformatics. 2014 Nov 1;30(21):3128-30. doi: 10.1093/bioinformatics/btu500. Epub 2014 Jul 26.
6
Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information.利用进化信息对蛋白质界面上的残基-残基相互作用进行稳健且准确的预测。
Elife. 2014 May 1;3:e02030. doi: 10.7554/eLife.02030.
7
3did: a catalog of domain-based interactions of known three-dimensional structure.3did:已知三维结构的基于域的相互作用目录。
Nucleic Acids Res. 2014 Jan;42(Database issue):D374-9. doi: 10.1093/nar/gkt887. Epub 2013 Sep 29.
8
Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era.在序列和结构丰富的时代评估基于共进化的残基-残基接触预测的效用。
Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15674-9. doi: 10.1073/pnas.1314045110. Epub 2013 Sep 5.
9
Prediction of contacts from correlated sequence substitutions.预测相关序列取代的接触。
Curr Opin Struct Biol. 2013 Jun;23(3):473-9. doi: 10.1016/j.sbi.2013.04.001. Epub 2013 May 14.
10
G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes.G-NEST:一种基因邻域评分工具,用于识别共保守、共表达的基因。
BMC Bioinformatics. 2012 Sep 28;13:253. doi: 10.1186/1471-2105-13-253.