Department of Systems Design Engineering, University of Waterloo, 200 University Avenue West, Waterloo, N2L 3G1, Ontario, Canada.
Biosciences Division, SRI International, 333 Ravenswood Ave, Menlo Park, CA, USA.
Sci Rep. 2018 Oct 4;8(1):14841. doi: 10.1038/s41598-018-32834-z.
Residue-residue close contact (R2R-C) data procured from three-dimensional protein-protein interaction (PPI) experiments is currently used for predicting residue-residue interaction (R2R-I) in PPI. However, due to complex physiochemical environments, R2R-I incidences, facilitated by multiple factors, are usually entangled in the source environment and masked in the acquired data. Here we present a novel method, P2K (Pattern to Knowledge), to disentangle R2R-I patterns and render much succinct discriminative information expressed in different specific R2R-I statistical/functional spaces. Since such knowledge is not visible in the data acquired, we refer to it as deep knowledge. Leveraging the deep knowledge discovered to construct machine learning models for sequence-based R2R-I prediction, without trial-and-error combination of the features over external knowledge of sequences, our R2R-I predictor was validated for its effectiveness under stringent leave-one-complex-out-alone cross-validation in a benchmark dataset, and was surprisingly demonstrated to perform better than an existing sequence-based R2R-I predictor by 28% (p: 1.9E-08). P2K is accessible via our web server on https://p2k.uwaterloo.ca .
目前,从三维蛋白质-蛋白质相互作用 (PPI) 实验中获取的残基-残基近距离接触 (R2R-C) 数据被用于预测 PPI 中的残基-残基相互作用 (R2R-I)。然而,由于复杂的物理化学环境,多种因素促进的 R2R-I 事件通常会在源环境中纠缠,并在获得的数据中被掩盖。在这里,我们提出了一种新的方法 P2K(模式到知识),用于分离 R2R-I 模式,并呈现出在不同特定 R2R-I 统计/功能空间中表达的简洁而有区别的信息。由于这种知识在获取的数据中不可见,我们将其称为深层知识。利用发现的深层知识来构建基于序列的 R2R-I 预测的机器学习模型,而无需在外部序列知识上反复试验组合特征,我们的 R2R-I 预测器在基准数据集的严格的单独留一复杂交叉验证下得到了验证,并且令人惊讶地证明比现有的基于序列的 R2R-I 预测器性能更好,提高了 28%(p:1.9E-08)。P2K 可通过我们在 https://p2k.uwaterloo.ca 上的网络服务器访问。