发现蛋白质家族中的共现模式及其生物学意义。

Discovering co-occurring patterns and their biological significance in protein families.

出版信息

BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S2. doi: 10.1186/1471-2105-15-S12-S2. Epub 2014 Nov 6.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4243116/

Abstract

BACKGROUND

The large influx of biological sequences poses the importance of identifying and correlating conserved regions in homologous sequences to acquire valuable biological knowledge. These conserved regions contain statistically significant residue associations as sequence patterns. Thus, patterns from two conserved regions co-occurring frequently on the same sequences are inferred to have joint functionality. A method for finding conserved regions in protein families with frequent co-occurrence patterns is proposed. The biological significance of the discovered clusters of conserved regions with co-occurrences patterns can be validated by their three-dimensional closeness of amino acids and the biological functionality found in those regions as supported by published work.

METHODS

Using existing algorithms, we discovered statistically significant amino acid associations as sequence patterns. We then aligned and clustered them into Aligned Pattern Clusters (APCs) corresponding to conserved regions with amino acid conservation and variation. When one APC frequently co-occurred with another APC, the two APCs have high co-occurrence. We then clustered APCs with high co-occurrence into what we refer to as Co-occurrence APC Clusters (Co-occurrence Clusters).

RESULTS

Our results show that for Co-occurrence Clusters, the three-dimensional distance between their amino acids is closer than average amino acid distances. For the Co-occurrence Clusters of the ubiquitin and the cytochrome c families, we observed biological significance among the residing amino acids of the APCs within the same cluster. In ubiquitin, the residues are responsible for ubiquitination as well as conventional and unconventional ubiquitin-bindings. In cytochrome c, amino acids in the first co-occurrence cluster contribute to binding of other proteins in the electron transport chain, and amino acids in the second co-occurrence cluster contribute to the stability of the axial heme ligand.

CONCLUSIONS

Thus, our co-occurrence clustering algorithm can efficiently find and rank conserved regions that contain patterns that frequently co-occurring on the same proteins. Co-occurring patterns are biologically significant due to their three-dimensional closeness and other evidences reported in literature. These results play an important role in drug discovery as biologists can quickly identify the target for drugs to conduct detailed preclinical studies.

摘要

背景

大量涌入的生物序列使得识别和关联同源序列中的保守区域以获取有价值的生物学知识变得尤为重要。这些保守区域包含具有统计学意义的残基关联，表现为序列模式。因此，如果两个保守区域在同一序列上频繁共现，则推断它们具有共同的功能。本文提出了一种在蛋白质家族中发现具有频繁共现模式的保守区域的方法。通过氨基酸的三维接近程度和这些区域在已发表文献中发现的生物学功能，可以验证所发现的具有共现模式的保守区域簇的生物学意义。

方法

使用现有的算法，我们发现了作为序列模式的具有统计学意义的氨基酸关联。然后，我们将它们对齐并聚类为具有氨基酸保守性和变异性的一致模式簇（APC）。当一个 APC 频繁与另一个 APC 共现时，这两个 APC 具有高共现性。然后，我们将具有高共现性的 APC 聚类为我们所称的共现 APC 簇（共现簇）。

结果

我们的结果表明，对于共现簇，它们的氨基酸之间的三维距离比平均氨基酸距离更近。对于泛素和细胞色素 c 家族的共现簇，我们观察到了同一簇内 APC 驻留氨基酸之间的生物学意义。在泛素中，残基负责泛素化以及常规和非传统的泛素结合。在细胞色素 c 中，第一共现簇中的氨基酸有助于与电子传递链中的其他蛋白质结合，第二共现簇中的氨基酸有助于轴向血红素配体的稳定性。

结论

因此，我们的共现聚类算法可以有效地发现和排列包含在同一蛋白质上频繁共现的模式的保守区域。共现模式具有生物学意义，因为它们的三维接近程度以及文献中报道的其他证据。这些结果在药物发现中起着重要作用，因为生物学家可以快速识别药物的靶标，以进行详细的临床前研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5890/4243116/7c3c9c693bc4/1471-2105-15-S12-S2-1.jpg

相似文献

Discovering co-occurring patterns and their biological significance in protein families.

BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S2. doi: 10.1186/1471-2105-15-S12-S2. Epub 2014 Nov 6.

Aligning and Clustering Patterns to Reveal the Protein Functionality of Sequences.

IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):548-60. doi: 10.1109/TCBB.2014.2306840.

Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics.

BMC Med Genomics. 2018 Nov 20;11(Suppl 5):103. doi: 10.1186/s12920-018-0417-z.

Prediction of Protein-Protein Interaction via co-occurring Aligned Pattern Clusters.

Methods. 2016 Nov 1;110:26-34. doi: 10.1016/j.ymeth.2016.07.018. Epub 2016 Jul 27.

Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property.

IEEE Trans Nanobioscience. 2005 Sep;4(3):255-65. doi: 10.1109/tnb.2005.853667.

Sequence patterns derived from the automated prediction of functional residues in structurally-aligned homologous protein families.

Bioinformatics. 2004 Oct 12;20(15):2380-9. doi: 10.1093/bioinformatics/bth255. Epub 2004 Apr 8.

Discovering Patterns From Sequences Using Pattern-Directed Aligned Pattern Clustering.

IEEE Trans Nanobioscience. 2018 Jul;17(3):209-218. doi: 10.1109/TNB.2018.2845741. Epub 2018 Jun 8.

An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.

J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.

Ranking and compacting binding segments of protein families using aligned pattern clusters.

Proteome Sci. 2013 Nov 7;11(Suppl 1):S8. doi: 10.1186/1477-5956-11-S1-S8.

Discovering Protein-DNA Binding Cores by Aligned Pattern Clustering.

IEEE/ACM Trans Comput Biol Bioinform. 2017 Mar-Apr;14(2):254-263. doi: 10.1109/TCBB.2015.2474376. Epub 2015 Aug 28.

引用本文的文献

Genomic insights into extended-spectrum -lactamase- and plasmid-borne AmpC-producing transmission between humans and livestock in rural Cambodia.

J Med Microbiol. 2025 Mar;74(3). doi: 10.1099/jmm.0.001988.

Using amino acids co-occurrence matrices and explainability model to investigate patterns in dengue virus proteins.

BMC Bioinformatics. 2022 Feb 19;23(1):80. doi: 10.1186/s12859-022-04597-y.

本文引用的文献

Ranking and compacting binding segments of protein families using aligned pattern clusters.

Proteome Sci. 2013 Nov 7;11(Suppl 1):S8. doi: 10.1186/1477-5956-11-S1-S8.

Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction.

Scientifica (Cairo). 2012;2012:917540. doi: 10.6064/2012/917540. Epub 2012 Oct 23.

Activities at the Universal Protein Resource (UniProt).

Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8. doi: 10.1093/nar/gkt1140. Epub 2013 Nov 18.

The role of key residues in structure, function, and stability of cytochrome-c.

Cell Mol Life Sci. 2014 Jan;71(2):229-55. doi: 10.1007/s00018-013-1341-1. Epub 2013 Apr 25.

Conformational change and human cytochrome c function: mutation of residue 41 modulates caspase activation and destabilizes Met-80 coordination.

J Biol Inorg Chem. 2013 Mar;18(3):289-97. doi: 10.1007/s00775-012-0973-1. Epub 2013 Jan 19.

Direct-coupling analysis of residue coevolution captures native contacts across many protein families.

Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301. doi: 10.1073/pnas.1111471108. Epub 2011 Nov 21.

A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives.

PLoS One. 2011 Mar 31;6(3):e18093. doi: 10.1371/journal.pone.0018093.

The binding interface of cytochrome c and cytochrome c₁ in the bc₁ complex: rationalizing the role of key residues.

Biophys J. 2010 Oct 20;99(8):2647-56. doi: 10.1016/j.bpj.2010.08.042.

Disentangling direct from indirect co-evolution of residues in protein alignments.

PLoS Comput Biol. 2010 Jan;6(1):e1000633. doi: 10.1371/journal.pcbi.1000633. Epub 2010 Jan 1.

The Pfam protein families database.

Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22. doi: 10.1093/nar/gkp985. Epub 2009 Nov 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

发现蛋白质家族中的共现模式及其生物学意义。

Discovering co-occurring patterns and their biological significance in protein families.

出版信息

BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S2. doi: 10.1186/1471-2105-15-S12-S2. Epub 2014 Nov 6.

DOI:10.1186/1471-2105-15-S12-S2

PMID:25474736

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4243116/

Abstract

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

摘要

发现蛋白质家族中的共现模式及其生物学意义。

Discovering co-occurring patterns and their biological significance in protein families.

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

发现蛋白质家族中的共现模式及其生物学意义。

Discovering co-occurring patterns and their biological significance in protein families.

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献