• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过结构域共现改进蛋白质序列的成对比较。

Improving pairwise comparison of protein sequences with domain co-occurrence.

作者信息

Menichelli Christophe, Gascuel Olivier, Bréhélin Laurent

机构信息

IBC, LIRMM, Univ. Montpellier, CNRS, Montpellier, France.

Unité de Bioinformatique Evolutive, C3BI - USR 3756, Institut Pasteur et CNRS, Paris, France.

出版信息

PLoS Comput Biol. 2018 Jan 2;14(1):e1005889. doi: 10.1371/journal.pcbi.1005889. eCollection 2018 Jan.

DOI:10.1371/journal.pcbi.1005889
PMID:29293498
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5766236/
Abstract

Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence.

摘要

比较和比对蛋白质序列是生物信息学中的一项基本任务。更具体地说,像BLAST这样的局部比对工具被广泛用于识别保守的蛋白质子序列,这些子序列可能对应于蛋白质结构域或功能基序。然而,为了限制假阳性的数量,这些工具在使用时设置了严格的序列相似性阈值,因此可能会错过一些匹配结果,特别是对于那些在系统发育上与参考生物体距离较远的物种。解决这个问题的一个办法是将额外的上下文信息整合到这个过程中。在这里,我们建议使用结构域共现来提高成对序列比较的灵敏度。结构域共现是蛋白质的一个重要特征,因为大多数蛋白质结构域倾向于与同一蛋白质上数量有限的其他结构域一起出现。我们提出了一种方法,在典型的BLAST分析中考虑这些信息,并基于这些结果构建新的结构域家族。我们以恶性疟原虫为例来评估我们的方法。实验结果表明,显著的BLAST匹配结果数量增加了14%,蛋白质组中可以被一个结构域覆盖的区域增加了25%。我们的方法识别出了2240个新的结构域,在大多数情况下,这些结构域与Pfam数据库的模型没有关联。此外,我们对新结构域在比对和物理化学性质方面的质量研究表明,它们与标准的Pfam结构域相近。所提出方法的源代码和补充数据可在以下网址获取:https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/bb6ae6a9091f/pcbi.1005889.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/87295c22e402/pcbi.1005889.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/62cfbd5bb64e/pcbi.1005889.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/2b4018e5501a/pcbi.1005889.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/176a9d73ac1c/pcbi.1005889.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/74117fc867d6/pcbi.1005889.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/7361ed6e52e8/pcbi.1005889.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/5cd92d8be235/pcbi.1005889.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/0d9ac072f0c5/pcbi.1005889.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/4bf5007ca96f/pcbi.1005889.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/27a414069cf0/pcbi.1005889.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/49d73543c89c/pcbi.1005889.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/bb6ae6a9091f/pcbi.1005889.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/87295c22e402/pcbi.1005889.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/62cfbd5bb64e/pcbi.1005889.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/2b4018e5501a/pcbi.1005889.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/176a9d73ac1c/pcbi.1005889.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/74117fc867d6/pcbi.1005889.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/7361ed6e52e8/pcbi.1005889.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/5cd92d8be235/pcbi.1005889.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/0d9ac072f0c5/pcbi.1005889.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/4bf5007ca96f/pcbi.1005889.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/27a414069cf0/pcbi.1005889.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/49d73543c89c/pcbi.1005889.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7283/5766236/bb6ae6a9091f/pcbi.1005889.g012.jpg

相似文献

1
Improving pairwise comparison of protein sequences with domain co-occurrence.通过结构域共现改进蛋白质序列的成对比较。
PLoS Comput Biol. 2018 Jan 2;14(1):e1005889. doi: 10.1371/journal.pcbi.1005889. eCollection 2018 Jan.
2
Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence.通过打破共识,结合多个图谱和结构域共现情况,实现了蛋白质结构域识别的改进。
PLoS Comput Biol. 2016 Jul 29;12(7):e1005038. doi: 10.1371/journal.pcbi.1005038. eCollection 2016 Jul.
3
Detection of new protein domains using co-occurrence: application to Plasmodium falciparum.利用共现检测新的蛋白质结构域:在疟原虫中的应用。
Bioinformatics. 2009 Dec 1;25(23):3077-83. doi: 10.1093/bioinformatics/btp560. Epub 2009 Sep 28.
4
Fitting hidden Markov models of protein domains to a target species: application to Plasmodium falciparum.将蛋白质结构域的隐马尔可夫模型拟合到目标物种上:在疟原虫中的应用。
BMC Bioinformatics. 2012 May 1;13:67. doi: 10.1186/1471-2105-13-67.
5
Variations on probabilistic suffix trees: statistical modeling and prediction of protein families.概率后缀树的变体:蛋白质家族的统计建模与预测
Bioinformatics. 2001 Jan;17(1):23-43. doi: 10.1093/bioinformatics/17.1.23.
6
EuPathDomains: the divergent domain database for eukaryotic pathogens.EuPathDomains:真核病原体的分歧结构域数据库。
Infect Genet Evol. 2011 Jun;11(4):698-707. doi: 10.1016/j.meegid.2010.09.008. Epub 2010 Nov 4.
7
A multi-objective optimization approach accurately resolves protein domain architectures.一种多目标优化方法能准确解析蛋白质结构域架构。
Bioinformatics. 2016 Feb 1;32(3):345-53. doi: 10.1093/bioinformatics/btv582. Epub 2015 Oct 12.
8
PSIBLAST_PairwiseStatSig: reordering PSI-BLAST hits using pairwise statistical significance.PSI-BLAST成对统计显著性:使用成对统计显著性对PSI-BLAST命中结果进行重新排序。
Bioinformatics. 2009 Apr 15;25(8):1082-3. doi: 10.1093/bioinformatics/btp089. Epub 2009 Feb 27.
9
SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees.基于支持向量机,利用成对概率后缀树检测远距离蛋白质结构关系。
Comput Biol Chem. 2006 Aug;30(4):292-9. doi: 10.1016/j.compbiolchem.2006.05.001.
10
Rapid similarity search of proteins using alignments of domain arrangements.利用结构域排列的比对进行蛋白质的快速相似性搜索。
Bioinformatics. 2014 Jan 15;30(2):274-81. doi: 10.1093/bioinformatics/btt379. Epub 2013 Jul 4.

引用本文的文献

1
ToxDL 2.0: Protein toxicity prediction using a pretrained language model and graph neural networks.ToxDL 2.0:使用预训练语言模型和图神经网络进行蛋白质毒性预测。
Comput Struct Biotechnol J. 2025 Apr 2;27:1538-1549. doi: 10.1016/j.csbj.2025.04.002. eCollection 2025.
2
CeGAL: Redefining a Widespread Fungal-Specific Transcription Factor Family Using an In Silico Error-Tracking Approach.CeGAL:利用计算机错误追踪方法重新定义一个广泛存在的真菌特异性转录因子家族
J Fungi (Basel). 2023 Mar 29;9(4):424. doi: 10.3390/jof9040424.
3
SCO-spondin, a giant matricellular protein that regulates cerebrospinal fluid activity.

本文引用的文献

1
Domain prediction with probabilistic directional context.基于概率性方向上下文的域预测
Bioinformatics. 2017 Aug 15;33(16):2471-2478. doi: 10.1093/bioinformatics/btx221.
2
Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence.通过打破共识,结合多个图谱和结构域共现情况,实现了蛋白质结构域识别的改进。
PLoS Comput Biol. 2016 Jul 29;12(7):e1005038. doi: 10.1371/journal.pcbi.1005038. eCollection 2016 Jul.
3
The Pfam protein families database: towards a more sustainable future.
SCO-spondin,一种调节脑脊液活动的巨大细胞外基质蛋白。
Fluids Barriers CNS. 2021 Oct 2;18(1):45. doi: 10.1186/s12987-021-00277-w.
4
Nature-inspired engineering of an F-type lectin for increased binding strength.受自然启发的 F 型凝集素工程改造以提高结合强度。
Glycobiology. 2018 Dec 1;28(12):933-948. doi: 10.1093/glycob/cwy082.
Pfam蛋白质家族数据库:迈向更可持续的未来。
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.
4
A multi-objective optimization approach accurately resolves protein domain architectures.一种多目标优化方法能准确解析蛋白质结构域架构。
Bioinformatics. 2016 Feb 1;32(3):345-53. doi: 10.1093/bioinformatics/btv582. Epub 2015 Oct 12.
5
Most partial domains in proteins are alignment and annotation artifacts.蛋白质中的大多数部分结构域是比对和注释伪迹。
Genome Biol. 2015 May 15;16(1):99. doi: 10.1186/s13059-015-0656-7.
6
Domain atrophy creates rare cases of functional partial protein domains.结构域萎缩产生了罕见的功能性部分蛋白质结构域病例。
Genome Biol. 2015 Apr 30;16(1):88. doi: 10.1186/s13059-015-0655-8.
7
Detection of orphan domains in Drosophila using "hydrophobic cluster analysis".利用“疏水簇分析”检测果蝇中的孤儿结构域
Biochimie. 2015 Dec;119:244-53. doi: 10.1016/j.biochi.2015.02.019. Epub 2015 Feb 28.
8
UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.UniRef聚类:一种用于改进序列相似性搜索的全面且可扩展的替代方法。
Bioinformatics. 2015 Mar 15;31(6):926-32. doi: 10.1093/bioinformatics/btu739. Epub 2014 Nov 13.
9
UniProt: a hub for protein information.通用蛋白质数据库(UniProt):蛋白质信息中心。
Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.
10
Identification of divergent protein domains by combining HMM-HMM comparisons and co-occurrence detection.通过结合隐马尔可夫模型(HMM)-HMM比较和共现检测来识别差异蛋白结构域。
PLoS One. 2014 Jun 5;9(6):e95275. doi: 10.1371/journal.pone.0095275. eCollection 2014.