Suppr超能文献

从比对模式簇中发现并解开比对残基关联以揭示亚组特征。

Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics.

作者信息

Zhou Pei-Yuan, Sze-To Antonio, Wong Andrew K C

机构信息

Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada.

出版信息

BMC Med Genomics. 2018 Nov 20;11(Suppl 5):103. doi: 10.1186/s12920-018-0417-z.

Abstract

BACKGROUND

A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors.

METHODS

To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV.

RESULTS

Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities.

CONCLUSIONS

E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine.

摘要

背景

一个蛋白质家族具有局部保守的相似和多样功能。比对模式簇(APC)能够反映保守功能。在APC中发现比对残基关联(ARA)可以揭示蛋白质家族保守区域的微妙内在工作特征。然而,由于多个细微的纠缠因素,对应于不同功能/亚组/类别的ARA可能会相互纠缠。

方法

为了从混合模式数据集中发现并解开模式,比如当残基被其基本生化特性列表取代时的APC,本文提出了一种新方法,扩展比对残基关联发现与解缠方法(E-ARADD)。E-ARADD将数值数据集离散化,把混合模式数据集转换为事件-值数据集,构建一个ARA频率矩阵,然后将其转换为一个调整后的统计残差(SR)向量空间(SRV),该空间捕捉与随机性的统计偏差。通过对SRV应用主成分(PC)分解,得到按方差排序的主成分。最后,当在一个主成分上的投影被重新投影到与SRV具有相同基向量的向量空间时,就发现了解缠的ARA。

结果

在合成数据、细胞色素c和A类清道夫数据上的实验表明,E-ARADD能够:a)解开APC中纠缠的ARA(带有残基或生化特性),b)揭示与类别、细微亚组或特定功能相关的微妙ARA簇。

结论

E-ARADD能够发现并解开在蛋白质家族功能和位置上纠缠的AR和ARA,以揭示生物保守区域的功能亚组和亚组特征。合成数据上的实验为成功解缠提供了概念验证,该解缠揭示了有或没有类别标签作为输入的与类别相关的ARA。细胞色素c数据上的实验证明了E-ARADD处理两种类型残基数据的有效性。我们的新方法不仅能够在特定的统计/功能(主成分和RSRV)空间中发现并解开AR和ARA,还能确定它们在蛋白质家族功能域中的位置。E-ARADD的成功展示了其在蛋白质组学研究、药物发现以及精准和个性化基因医学方面的巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd28/6245498/f47774006994/12920_2018_417_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验