• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

串联重复为趋同进化至相似蛋白质结构提供了证据。

Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures.

作者信息

Wright Erik S

机构信息

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15219, USA.

Center for Evolutionary Biology and Medicine, Pittsburgh, PA 15219, USA.

出版信息

Genome Biol Evol. 2025 Feb 3;17(2). doi: 10.1093/gbe/evaf013.

DOI:10.1093/gbe/evaf013
PMID:39852593
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11812678/
Abstract

Homology is a key concept underpinning the comparison of sequences across organisms. Sequence-level homology is based on a statistical framework optimized over decades of work. Recently, computational protein structure prediction has enabled large-scale homology inference beyond the limits of accurate sequence alignment. In this regime, it is possible to observe nearly identical protein structures lacking detectable sequence similarity. In the absence of a robust statistical framework for structure comparison, it is largely assumed similar structures are homologous. However, it is conceivable that matching structures could arise through convergent evolution, resulting in analogous proteins without shared ancestry. Large databases of predicted structures offer a means of determining whether analogs are present among structure matches. Here, I find that a small subset (∼2.6%) of Foldseek clusters lack sequence-level support for homology, including ∼1% of strong structure matches with template modeling score ≥ 0.5. This result by itself does not imply these structure pairs are nonhomologous, since their sequences could have diverged beyond the limits of recognition. Yet, strong matches without sequence-level support for homology are enriched in structures with predicted repeats that could induce spurious matches. Some of these structural repeats are underpinned by sequence-level tandem repeats in both matching structures. I show that many of these tandem repeat units have genealogies inconsistent with their corresponding structures sharing a common ancestor, implying these highly similar structure pairs are analogous rather than homologous. This result suggests caution is warranted when inferring homology from structural resemblance alone in the absence of sequence-level support for homology.

摘要

同源性是支撑跨生物体序列比较的关键概念。序列水平的同源性基于经过数十年研究优化的统计框架。最近,计算蛋白质结构预测使得大规模同源性推断超出了精确序列比对的限制。在这种情况下,有可能观察到缺乏可检测序列相似性的近乎相同的蛋白质结构。在缺乏用于结构比较的强大统计框架的情况下,人们大多认为相似结构是同源的。然而,可以想象匹配的结构可能通过趋同进化产生,从而导致没有共同祖先的类似蛋白质。大量预测结构数据库提供了一种确定结构匹配中是否存在类似物的方法。在这里,我发现Foldseek聚类的一小部分(约2.6%)缺乏序列水平的同源性支持,包括约1%的模板建模得分≥0.5的强结构匹配。这一结果本身并不意味着这些结构对是非同源的,因为它们的序列可能已经分化到超出识别的极限。然而,缺乏序列水平同源性支持的强匹配在具有预测重复序列的结构中富集,这些重复序列可能会导致虚假匹配。其中一些结构重复序列在两个匹配结构中都由序列水平的串联重复序列支撑。我表明,许多这些串联重复单元的谱系与它们相应的结构共享共同祖先不一致,这意味着这些高度相似的结构对是类似物而非同源物。这一结果表明,在没有序列水平同源性支持的情况下,仅从结构相似性推断同源性时需要谨慎。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3df/11812678/44310618c6ec/evaf013f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3df/11812678/3a045ee516ed/evaf013f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3df/11812678/3101bb4fee74/evaf013f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3df/11812678/78363953dc7e/evaf013f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3df/11812678/44310618c6ec/evaf013f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3df/11812678/3a045ee516ed/evaf013f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3df/11812678/3101bb4fee74/evaf013f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3df/11812678/78363953dc7e/evaf013f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3df/11812678/44310618c6ec/evaf013f4.jpg

相似文献

1
Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures.串联重复为趋同进化至相似蛋白质结构提供了证据。
Genome Biol Evol. 2025 Feb 3;17(2). doi: 10.1093/gbe/evaf013.
2
Database of homology-derived protein structures and the structural meaning of sequence alignment.同源性衍生蛋白质结构数据库及序列比对的结构意义
Proteins. 1991;9(1):56-68. doi: 10.1002/prot.340090107.
3
Improving protein secondary structure prediction based on short subsequences with local structure similarity.基于局部结构相似性的短序列提高蛋白质二级结构预测。
BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2164-11-S4-S4.
4
Cell surface proteins in archaeal and bacterial genomes comprising "LVIVD", "RIVW" and "LGxL" tandem sequence repeats are predicted to fold as beta-propeller.古菌和细菌基因组中包含“LVIVD”、“RIVW”和“LGxL”串联序列重复的细胞表面蛋白预计会折叠成β-螺旋桨结构。
Int J Biol Macromol. 2007 Oct 1;41(4):454-68. doi: 10.1016/j.ijbiomac.2007.06.004. Epub 2007 Jun 17.
5
Protein tandem repeats - the more perfect, the less structured.蛋白质串联重复 - 越完美,结构越少。
FEBS J. 2010 Jun;277(12):2673-82. doi: 10.1111/j.1742-464X.2010.07684.x.
6
Tracking repeats using significance and transitivity.利用显著性和传递性追踪重复序列。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i311-7. doi: 10.1093/bioinformatics/bth911.
7
A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families.一种基于图形的方法用于检测高度分化的重复蛋白家族中的序列同源性。
Methods Mol Biol. 2019;1851:251-261. doi: 10.1007/978-1-4939-8736-8_13.
8
Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution.对数百个物种串联重复序列的比较分析揭示了对着丝粒进化的独特见解。
Genome Biol. 2013 Jan 30;14(1):R10. doi: 10.1186/gb-2013-14-1-r10.
9
Prediction of protein subcellular localization.蛋白质亚细胞定位预测
Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.
10
Supersites within superfolds. Binding site similarity in the absence of homology.超级折叠中的超级位点。无同源性时的结合位点相似性。
J Mol Biol. 1998 Oct 2;282(4):903-18. doi: 10.1006/jmbi.1998.2043.

引用本文的文献

1
Accurate detection of tandem repeats exposes ubiquitous reuse of biological sequences.串联重复序列的准确检测揭示了生物序列的普遍重用。
Nucleic Acids Res. 2025 Sep 5;53(17). doi: 10.1093/nar/gkaf866.

本文引用的文献

1
Protein structure alignment by Reseek improves sensitivity to remote homologs.Reseek 通过蛋白质结构比对提高了对远程同源物的灵敏度。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae687.
2
RepeatsDB in 2025: expanding annotations of structured tandem repeats proteins on AlphaFoldDB.2025年的重复序列数据库:在AlphaFoldDB上扩展结构化串联重复序列蛋白的注释
Nucleic Acids Res. 2025 Jan 6;53(D1):D575-D581. doi: 10.1093/nar/gkae965.
3
Protein superfolds are characterised as frustration-free topologies: A case study of pure parallel β-sheet topologies.
蛋白质超折叠的特点是无缠结拓扑结构:纯平行β-折叠拓扑结构的案例研究。
PLoS Comput Biol. 2024 Aug 7;20(8):e1012282. doi: 10.1371/journal.pcbi.1012282. eCollection 2024 Aug.
4
Convergent evolution of plant prickles by repeated gene co-option over deep time.在漫长的时间里,植物刺通过反复的基因共选择实现趋同进化。
Science. 2024 Aug 2;385(6708):eado1663. doi: 10.1126/science.ado1663.
5
RCSB protein Data Bank: exploring protein 3D similarities via comprehensive structural alignments.RCSB 蛋白质数据库:通过全面的结构比对探索蛋白质 3D 相似性。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae370.
6
Accurately clustering biological sequences in linear time by relatedness sorting.通过相关排序在线性时间内准确地对生物序列进行聚类。
Nat Commun. 2024 Apr 8;15(1):3047. doi: 10.1038/s41467-024-47371-9.
7
PLMSearch: Protein language model powers accurate and fast sequence search for remote homology.PLMSearch:蛋白质语言模型为远程同源性的准确快速序列搜索提供动力。
Nat Commun. 2024 Mar 30;15(1):2775. doi: 10.1038/s41467-024-46808-5.
8
Clustering predicted structures at the scale of the known protein universe.对已知蛋白质宇宙尺度的预测结构进行聚类。
Nature. 2023 Oct;622(7983):637-645. doi: 10.1038/s41586-023-06510-w. Epub 2023 Sep 13.
9
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
10
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.