Suppr超能文献

基于深度测序和分子模拟的病毒蛋白酶特异性全景数据驱动的有监督学习。

Data-driven supervised learning of a viral protease specificity landscape from deep sequencing and molecular simulations.

机构信息

Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854.

Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854.

出版信息

Proc Natl Acad Sci U S A. 2019 Jan 2;116(1):168-176. doi: 10.1073/pnas.1805256116. Epub 2018 Dec 26.

Abstract

Biophysical interactions between proteins and peptides are key determinants of molecular recognition specificity landscapes. However, an understanding of how molecular structure and residue-level energetics at protein-peptide interfaces shape these landscapes remains elusive. We combine information from yeast-based library screening, next-generation sequencing, and structure-based modeling in a supervised machine learning approach to report the comprehensive sequence-energetics-function mapping of the specificity landscape of the hepatitis C virus (HCV) NS3/4A protease, whose function-site-specific cleavages of the viral polyprotein-is a key determinant of viral fitness. We screened a library of substrates in which five residue positions were randomized and measured cleavability of ∼30,000 substrates (∼1% of the library) using yeast display and fluorescence-activated cell sorting followed by deep sequencing. Structure-based models of a subset of experimentally derived sequences were used in a supervised learning procedure to train a support vector machine to predict the cleavability of 3.2 million substrate variants by the HCV protease. The resulting landscape allows identification of previously unidentified HCV protease substrates, and graph-theoretic analyses reveal extensive clustering of cleavable and uncleavable motifs in sequence space. Specificity landscapes of known drug-resistant variants are similarly clustered. The described approach should enable the elucidation and redesign of specificity landscapes of a wide variety of proteases, including human-origin enzymes. Our results also suggest a possible role for residue-level energetics in shaping plateau-like functional landscapes predicted from viral quasispecies theory.

摘要

蛋白质与肽之间的生物物理相互作用是决定分子识别特异性景观的关键因素。然而,对于蛋白质-肽界面的分子结构和残基水平的能量如何塑造这些景观,我们仍然难以理解。我们结合了基于酵母的文库筛选、下一代测序和基于结构的建模信息,采用监督机器学习方法,报告了丙型肝炎病毒 (HCV) NS3/4A 蛋白酶特异性景观的全面序列-能量-功能图谱,其功能位点特异性切割病毒多蛋白是病毒适应性的关键决定因素。我们筛选了一个文库,其中 5 个残基位置是随机的,并使用酵母展示和荧光激活细胞分选结合深度测序测量了大约 30,000 个底物(文库的约 1%)的可切割性。通过实验获得的序列子集的结构模型被用于监督学习过程中,以训练支持向量机来预测 HCV 蛋白酶对 320 万个底物变体的可切割性。由此产生的景观允许鉴定以前未识别的 HCV 蛋白酶底物,并且图论分析揭示了序列空间中可切割和不可切割基序的广泛聚类。已知耐药变体的特异性景观也类似地聚类。所描述的方法应该能够阐明和重新设计各种蛋白酶的特异性景观,包括人类来源的酶。我们的研究结果还表明,残基水平的能量可能在塑造基于病毒准种理论预测的类似高原的功能景观方面发挥作用。

相似文献

引用本文的文献

本文引用的文献

6
High-order epistasis shapes evolutionary trajectories.高阶上位性塑造进化轨迹。
PLoS Comput Biol. 2017 May 15;13(5):e1005541. doi: 10.1371/journal.pcbi.1005541. eCollection 2017 May.
8

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验