• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从比对模式簇中发现并解开比对残基关联以揭示亚组特征。

Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics.

作者信息

Zhou Pei-Yuan, Sze-To Antonio, Wong Andrew K C

机构信息

Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada.

出版信息

BMC Med Genomics. 2018 Nov 20;11(Suppl 5):103. doi: 10.1186/s12920-018-0417-z.

DOI:10.1186/s12920-018-0417-z
PMID:30453949
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6245498/
Abstract

BACKGROUND

A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors.

METHODS

To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV.

RESULTS

Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities.

CONCLUSIONS

E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine.

摘要

背景

一个蛋白质家族具有局部保守的相似和多样功能。比对模式簇(APC)能够反映保守功能。在APC中发现比对残基关联(ARA)可以揭示蛋白质家族保守区域的微妙内在工作特征。然而,由于多个细微的纠缠因素,对应于不同功能/亚组/类别的ARA可能会相互纠缠。

方法

为了从混合模式数据集中发现并解开模式,比如当残基被其基本生化特性列表取代时的APC,本文提出了一种新方法,扩展比对残基关联发现与解缠方法(E-ARADD)。E-ARADD将数值数据集离散化,把混合模式数据集转换为事件-值数据集,构建一个ARA频率矩阵,然后将其转换为一个调整后的统计残差(SR)向量空间(SRV),该空间捕捉与随机性的统计偏差。通过对SRV应用主成分(PC)分解,得到按方差排序的主成分。最后,当在一个主成分上的投影被重新投影到与SRV具有相同基向量的向量空间时,就发现了解缠的ARA。

结果

在合成数据、细胞色素c和A类清道夫数据上的实验表明,E-ARADD能够:a)解开APC中纠缠的ARA(带有残基或生化特性),b)揭示与类别、细微亚组或特定功能相关的微妙ARA簇。

结论

E-ARADD能够发现并解开在蛋白质家族功能和位置上纠缠的AR和ARA,以揭示生物保守区域的功能亚组和亚组特征。合成数据上的实验为成功解缠提供了概念验证,该解缠揭示了有或没有类别标签作为输入的与类别相关的ARA。细胞色素c数据上的实验证明了E-ARADD处理两种类型残基数据的有效性。我们的新方法不仅能够在特定的统计/功能(主成分和RSRV)空间中发现并解开AR和ARA,还能确定它们在蛋白质家族功能域中的位置。E-ARADD的成功展示了其在蛋白质组学研究、药物发现以及精准和个性化基因医学方面的巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/01bacb09b31e/12920_2018_417_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/f47774006994/12920_2018_417_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/da305fd2dd12/12920_2018_417_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/de5d5d2fc772/12920_2018_417_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/ea9bc8461e77/12920_2018_417_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/5a15837cbced/12920_2018_417_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/d7797f520986/12920_2018_417_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/e9a2aaf0c6bd/12920_2018_417_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/a6a5612b4b9b/12920_2018_417_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/297e47d4c6d6/12920_2018_417_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/21d1749b971a/12920_2018_417_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/387df41c4579/12920_2018_417_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/992f1fec0045/12920_2018_417_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/67a4ff1647b7/12920_2018_417_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/aee34c5fa4ee/12920_2018_417_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/01bacb09b31e/12920_2018_417_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/f47774006994/12920_2018_417_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/da305fd2dd12/12920_2018_417_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/de5d5d2fc772/12920_2018_417_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/ea9bc8461e77/12920_2018_417_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/5a15837cbced/12920_2018_417_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/d7797f520986/12920_2018_417_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/e9a2aaf0c6bd/12920_2018_417_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/a6a5612b4b9b/12920_2018_417_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/297e47d4c6d6/12920_2018_417_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/21d1749b971a/12920_2018_417_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/387df41c4579/12920_2018_417_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/992f1fec0045/12920_2018_417_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/67a4ff1647b7/12920_2018_417_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/aee34c5fa4ee/12920_2018_417_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/01bacb09b31e/12920_2018_417_Fig15_HTML.jpg

相似文献

1
Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics.从比对模式簇中发现并解开比对残基关联以揭示亚组特征。
BMC Med Genomics. 2018 Nov 20;11(Suppl 5):103. doi: 10.1186/s12920-018-0417-z.
2
Revealing Subtle Functional Subgroups in Class A Scavenger Receptors by Pattern Discovery and Disentanglement of Aligned Pattern Clusters.通过模式发现和对齐模式簇的解缠揭示A类清道夫受体中的细微功能亚群
Proteomes. 2018 Feb 8;6(1):10. doi: 10.3390/proteomes6010010.
3
Pattern Discovery and Disentanglement for Aligned Pattern Cluster Analysis and Protein Binding Complexes Detection用于对齐模式聚类分析和蛋白质结合复合物检测的模式发现与解缠
4
Discovering co-occurring patterns and their biological significance in protein families.发现蛋白质家族中的共现模式及其生物学意义。
BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S2. doi: 10.1186/1471-2105-15-S12-S2. Epub 2014 Nov 6.
5
Discovering Patterns From Sequences Using Pattern-Directed Aligned Pattern Clustering.使用模式导向对齐模式聚类从序列中发现模式。
IEEE Trans Nanobioscience. 2018 Jul;17(3):209-218. doi: 10.1109/TNB.2018.2845741. Epub 2018 Jun 8.
6
Partitioning and correlating subgroup characteristics from Aligned Pattern Clusters.从对齐模式聚类中划分和关联亚组特征。
Bioinformatics. 2016 Aug 15;32(16):2427-34. doi: 10.1093/bioinformatics/btw211. Epub 2016 Apr 22.
7
Ranking and compacting binding segments of protein families using aligned pattern clusters.利用对齐模式簇对蛋白质家族的结合片段进行排序和压缩。
Proteome Sci. 2013 Nov 7;11(Suppl 1):S8. doi: 10.1186/1477-5956-11-S1-S8.
8
Aligning and Clustering Patterns to Reveal the Protein Functionality of Sequences.比对和聚类模式以揭示序列的蛋白质功能
IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):548-60. doi: 10.1109/TCBB.2014.2306840.
9
Discovering Protein-DNA Binding Cores by Aligned Pattern Clustering.通过对齐模式聚类发现蛋白质-DNA结合核心
IEEE/ACM Trans Comput Biol Bioinform. 2017 Mar-Apr;14(2):254-263. doi: 10.1109/TCBB.2015.2474376. Epub 2015 Aug 28.
10
Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement.基于模式发现与解缠的具有不平衡类别分布的临床数据的解释与预测。
BMC Med Inform Decis Mak. 2021 Jan 9;21(1):16. doi: 10.1186/s12911-020-01356-y.

引用本文的文献

1
An Unsupervised Error Detection Methodology for Detecting Mislabels in Healthcare Analytics.一种用于医疗保健分析中检测错误标签的无监督错误检测方法。
Bioengineering (Basel). 2024 Jul 31;11(8):770. doi: 10.3390/bioengineering11080770.
2
Pattern discovery and disentanglement on relational datasets.关系型数据集的模式发现与解缠。
Sci Rep. 2021 Mar 11;11(1):5688. doi: 10.1038/s41598-021-84869-4.
3
Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement.基于模式发现与解缠的具有不平衡类别分布的临床数据的解释与预测。

本文引用的文献

1
Revealing Subtle Functional Subgroups in Class A Scavenger Receptors by Pattern Discovery and Disentanglement of Aligned Pattern Clusters.通过模式发现和对齐模式簇的解缠揭示A类清道夫受体中的细微功能亚群
Proteomes. 2018 Feb 8;6(1):10. doi: 10.3390/proteomes6010010.
2
Partitioning and correlating subgroup characteristics from Aligned Pattern Clusters.从对齐模式聚类中划分和关联亚组特征。
Bioinformatics. 2016 Aug 15;32(16):2427-34. doi: 10.1093/bioinformatics/btw211. Epub 2016 Apr 22.
3
Aligning and Clustering Patterns to Reveal the Protein Functionality of Sequences.
BMC Med Inform Decis Mak. 2021 Jan 9;21(1):16. doi: 10.1186/s12911-020-01356-y.
比对和聚类模式以揭示序列的蛋白质功能
IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):548-60. doi: 10.1109/TCBB.2014.2306840.
4
Scavenger receptor structure and function in health and disease.清道夫受体在健康与疾病中的结构和功能
Cells. 2015 May 22;4(2):178-201. doi: 10.3390/cells4020178.
5
Ranking and compacting binding segments of protein families using aligned pattern clusters.利用对齐模式簇对蛋白质家族的结合片段进行排序和压缩。
Proteome Sci. 2013 Nov 7;11(Suppl 1):S8. doi: 10.1186/1477-5956-11-S1-S8.
6
Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction.位置权重矩阵、吉布斯采样器以及基序表征与预测中的相关显著性检验。
Scientifica (Cairo). 2012;2012:917540. doi: 10.6064/2012/917540. Epub 2012 Oct 23.
7
A primer to frequent itemset mining for bioinformatics.生物信息学频繁项集挖掘入门
Brief Bioinform. 2015 Mar;16(2):216-31. doi: 10.1093/bib/bbt074. Epub 2013 Oct 26.
8
The evolution of the class A scavenger receptors.A 类清道夫受体的进化。
BMC Evol Biol. 2012 Nov 27;12:227. doi: 10.1186/1471-2148-12-227.
9
DECA: A Discrete-Valued Data Clustering Algorithm.DECA:一种离散值数据聚类算法。
IEEE Trans Pattern Anal Mach Intell. 1979 Apr;1(4):342-9. doi: 10.1109/tpami.1979.4766942.
10
A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives.多种序列比对方法的综合基准研究:当前的挑战与未来展望。
PLoS One. 2011 Mar 31;6(3):e18093. doi: 10.1371/journal.pone.0018093.