利用宏基因组序列数据进行蛋白质结构测定。

Protein structure determination using metagenome sequence data.

作者信息

Ovchinnikov Sergey, Park Hahnbeom, Varghese Neha, Huang Po-Ssu, Pavlopoulos Georgios A, Kim David E, Kamisetty Hetunandan, Kyrpides Nikos C, Baker David

机构信息

Department of Biochemistry, University of Washington, Seattle, WA 98105, USA.

Institute for Protein Design, University of Washington, Seattle, WA 98105, USA.

出版信息

Science. 2017 Jan 20;355(6322):294-298. doi: 10.1126/science.aah4043.

DOI:10.1126/science.aah4043

PMID:28104891

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5493203/

Abstract

Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.

摘要

尽管结构生物学家已经进行了数十年的研究工作，但仍有大约5200个蛋白质家族的结构在比较建模范围之外尚属未知。我们表明，由从进化信息推断出的残基-残基接触所引导的罗塞塔结构预测，能够准确地对属于大型家族的蛋白质进行建模，而且宏基因组序列数据使具有足够序列用于准确建模的蛋白质家族数量增加了两倍多。然后，我们整合宏基因组数据、基于接触的结构匹配和罗塞塔结构计算，为614个目前结构未知的蛋白质家族生成模型；其中206个是膜蛋白，137个具有蛋白质数据库中未呈现的折叠方式。这种方法以一小部分成本为大型蛋白质家族提供了最初被设想为蛋白质结构计划目标的代表性模型。

相似文献

Protein structure determination using metagenome sequence data.利用宏基因组序列数据进行蛋白质结构测定。

Science. 2017 Jan 20;355(6322):294-298. doi: 10.1126/science.aah4043.

Protein structure prediction using Rosetta in CASP12.在蛋白质结构预测关键评估第12轮（CASP12）中使用罗塞塔软件进行蛋白质结构预测。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):113-121. doi: 10.1002/prot.25390. Epub 2017 Oct 8.

Decoding the link of microbiome niches with homologous sequences enables accurately targeted protein structure prediction.解析微生物组生态位与同源序列的联系，能够实现精确靶向的蛋白质结构预测。

Proc Natl Acad Sci U S A. 2021 Dec 7;118(49). doi: 10.1073/pnas.2110828118.

Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era.在序列和结构丰富的时代评估基于共进化的残基-残基接触预测的效用。

Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15674-9. doi: 10.1073/pnas.1314045110. Epub 2013 Sep 5.

Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age.蛋白质结构预测技术关键评估第12轮（CASP12）中的接触预测评估：协同进化与深度学习走向成熟。

Proteins. 2018 Mar;86 Suppl 1(Suppl Suppl 1):51-66. doi: 10.1002/prot.25407. Epub 2017 Nov 7.

phenix.mr_rosetta: molecular replacement and model rebuilding with Phenix and Rosetta.菲尼克斯（Phenix）的Rosetta分子置换与模型重建

J Struct Funct Genomics. 2012 Jun;13(2):81-90. doi: 10.1007/s10969-012-9129-3. Epub 2012 Mar 15.

Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12.在蛋白质结构预测技术评估第12轮（CASP12）中，基于模板以及I-TASSER和QUARK流程的自由建模，并使用预测的接触图。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):136-151. doi: 10.1002/prot.25414. Epub 2017 Nov 14.

Combining Evolutionary Information and an Iterative Sampling Strategy for Accurate Protein Structure Prediction.结合进化信息与迭代采样策略以实现准确的蛋白质结构预测。

PLoS Comput Biol. 2015 Dec 29;11(12):e1004661. doi: 10.1371/journal.pcbi.1004661. eCollection 2015 Dec.

Assessing Predicted Contacts for Building Protein Three-Dimensional Models.评估用于构建蛋白质三维模型的预测接触。

Methods Mol Biol. 2017;1484:115-126. doi: 10.1007/978-1-4939-6406-2_9.

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.基于深度学习的蛋白质三级结构建模和 CASP13 中的接触距离预测。

Proteins. 2019 Dec;87(12):1165-1178. doi: 10.1002/prot.25697. Epub 2019 Apr 25.

引用本文的文献

A genomic view of Earth's biomes.地球生物群落的基因组视角。

Nat Rev Genet. 2025 Sep 15. doi: 10.1038/s41576-025-00888-1.

Improving prediction accuracy in chimeric proteins with windowed multiple sequence alignment.通过窗口化多序列比对提高嵌合蛋白的预测准确性。

Comput Struct Biotechnol J. 2025 Jul 23;27:3292-3298. doi: 10.1016/j.csbj.2025.07.039. eCollection 2025.

Deep-learning-based single-domain and multidomain protein structure prediction with D-I-TASSER.基于深度学习的单域和多域蛋白质结构预测与D-I-TASSER

Nat Biotechnol. 2025 May 23. doi: 10.1038/s41587-025-02654-4.

The ethics of using artificial intelligence in scientific research: new guidance needed for a new tool.科学研究中使用人工智能的伦理问题：新工具需要新指南。

AI Ethics. 2025 Apr;5(2):1499-1521. doi: 10.1007/s43681-024-00493-8. Epub 2024 May 27.

The RNA helicase HrpA rescues collided ribosomes in E. coli.RNA解旋酶HrpA拯救大肠杆菌中碰撞的核糖体。

Mol Cell. 2025 Mar 6;85(5):999-1007.e7. doi: 10.1016/j.molcel.2025.01.018. Epub 2025 Feb 7.

NuFold: end-to-end approach for RNA tertiary structure prediction with flexible nucleobase center representation.NuFold：一种采用灵活核碱基中心表示法进行RNA三级结构预测的端到端方法。

Nat Commun. 2025 Jan 21;16(1):881. doi: 10.1038/s41467-025-56261-7.

Direct visualization of electric-field-stimulated ion conduction in a potassium channel.钾通道中电场刺激离子传导的直接可视化

Cell. 2025 Jan 9;188(1):77-88.e15. doi: 10.1016/j.cell.2024.12.006.

Exploring Evolution to Uncover Insights Into Protein Mutational Stability.探索进化以揭示蛋白质突变稳定性的见解。

Mol Biol Evol. 2025 Jan 6;42(1). doi: 10.1093/molbev/msae267.

The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction.多重序列比对在分子结构与功能预测中的历史演变及意义

Biomolecules. 2024 Nov 29;14(12):1531. doi: 10.3390/biom14121531.

Minimal twister sister-like self-cleaving ribozymes in the human genome revealed by deep mutational scanning.通过深度突变扫描揭示的人类基因组中最小的类扭结姐妹自切割核酶

Elife. 2024 Dec 5;12:RP90254. doi: 10.7554/eLife.90254.

本文引用的文献

Crystal structure of an Fe-S cluster-containing fumarate hydratase enzyme from Leishmania major reveals a unique protein fold.来自硕大利什曼原虫的含Fe-S簇的延胡索酸水合酶的晶体结构揭示了一种独特的蛋白质折叠。

Proc Natl Acad Sci U S A. 2016 Aug 30;113(35):9804-9. doi: 10.1073/pnas.1605031113. Epub 2016 Aug 15.

Structural basis for amino acid export by DMT superfamily transporter YddG.DMT 超家族转运蛋白 YddG 进行氨基酸输出的结构基础。

Nature. 2016 Jun 16;534(7607):417-20. doi: 10.1038/nature17991. Epub 2016 May 30.

Structure of a bd oxidase indicates similar mechanisms for membrane-integrated oxygen reductases.一种bd氧化酶的结构表明膜整合氧还原酶具有相似的机制。

Science. 2016 Apr 29;352(6285):583-6. doi: 10.1126/science.aaf2477.

Structural basis of lipoprotein signal peptidase II action and inhibition by the antibiotic globomycin.脂蛋白信号肽酶 II 作用的结构基础及抗生素 globomycin 的抑制作用。

Science. 2016 Feb 19;351(6275):876-80. doi: 10.1126/science.aad3747.

Crystal structure of E. coli lipoprotein diacylglyceryl transferase.大肠杆菌脂蛋白二酰甘油转移酶的晶体结构

Nat Commun. 2016 Jan 5;7:10198. doi: 10.1038/ncomms10198.

Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta.通过将协同进化信息整合到Rosetta中，改进了CASP11中的从头结构预测。

Proteins. 2016 Sep;84 Suppl 1(Suppl 1):67-75. doi: 10.1002/prot.24974. Epub 2016 Feb 24.

The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库：迈向更可持续的未来。

Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.

Crystal structures of a double-barrelled fluoride ion channel.双筒氟离子通道的晶体结构

Nature. 2015 Sep 24;525(7570):548-51. doi: 10.1038/nature14981. Epub 2015 Sep 7.

Large-scale determination of previously unsolved protein structures using evolutionary information.利用进化信息大规模测定先前未解决的蛋白质结构。

Elife. 2015 Sep 3;4:e09248. doi: 10.7554/eLife.09248.

Computation and Functional Studies Provide a Model for the Structure of the Zinc Transporter hZIP4.计算和功能研究为锌转运蛋白hZIP4的结构提供了一个模型。

J Biol Chem. 2015 Jul 17;290(29):17796-17805. doi: 10.1074/jbc.M114.617613. Epub 2015 May 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用宏基因组序列数据进行蛋白质结构测定。

Protein structure determination using metagenome sequence data.

作者信息

Ovchinnikov Sergey, Park Hahnbeom, Varghese Neha, Huang Po-Ssu, Pavlopoulos Georgios A, Kim David E, Kamisetty Hetunandan, Kyrpides Nikos C, Baker David

机构信息

Department of Biochemistry, University of Washington, Seattle, WA 98105, USA.

Institute for Protein Design, University of Washington, Seattle, WA 98105, USA.

出版信息

Science. 2017 Jan 20;355(6322):294-298. doi: 10.1126/science.aah4043.

DOI:10.1126/science.aah4043

PMID:28104891

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5493203/

Abstract

摘要

利用宏基因组序列数据进行蛋白质结构测定。

Protein structure determination using metagenome sequence data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

利用宏基因组序列数据进行蛋白质结构测定。

Protein structure determination using metagenome sequence data.

作者信息

机构信息

出版信息