用于经验优化可溶性蛋白结构域大规模生产的数学模型。

Mathematical model for empirically optimizing large scale production of soluble protein domains.

机构信息

Genomic Sciences Center RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan.

出版信息

BMC Bioinformatics. 2010 Mar 1;11:113. doi: 10.1186/1471-2105-11-113.

DOI:10.1186/1471-2105-11-113

PMID:20193068

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2843616/

Abstract

BACKGROUND

Efficient dissection of large proteins into their structural domains is critical for high throughput proteome analysis. So far, no study has focused on mathematically modeling a protein dissection protocol in terms of a production system. Here, we report a mathematical model for empirically optimizing the cost of large-scale domain production in proteomics research.

RESULTS

The model computes the expected number of successfully producing soluble domains, using a conditional probability between domain and boundary identification. Typical values for the model's parameters were estimated using the experimental results for identifying soluble domains from the 2,032 Kazusa HUGE protein sequences. Among the 215 fragments corresponding to the 24 domains that were expressed correctly, 111, corresponding to 18 domains, were soluble. Our model indicates that, under the conditions used in our pilot experiment, the probability of correctly predicting the existence of a domain was 81% (175/215) and that of predicting its boundary was 63% (111/175). Under these conditions, the most cost/effort-effective production of soluble domains was to prepare one to seven fragments per predicted domain.

CONCLUSIONS

Our mathematical modeling of protein dissection protocols indicates that the optimum number of fragments tested per domain is actually much smaller than expected a priori. The application range of our model is not limited to protein dissection, and it can be utilized for designing various large-scale mutational analyses or screening libraries.

摘要

背景

高效地将大型蛋白质切割成其结构域对于高通量蛋白质组分析至关重要。迄今为止，尚无研究从生产系统的角度对蛋白质切割方案进行数学建模。在这里，我们报告了一种数学模型，用于根据蛋白质组学研究中的经验优化大规模结构域生产的成本。

结果

该模型使用域和边界识别之间的条件概率计算成功生产可溶性结构域的预期数量。使用从 2032 个 Kazusa HUGE 蛋白质序列中识别可溶性结构域的实验结果来估算模型参数的典型值。在所表达的正确的 24 个结构域的 215 个片段中，有 111 个（对应于 18 个结构域）是可溶性的。我们的模型表明，在我们的初步实验中使用的条件下，正确预测结构域存在的概率为 81％（175/215），正确预测其边界的概率为 63％（111/175）。在这些条件下，每个预测结构域生产可溶性结构域的最具成本效益的方法是准备一个到七个片段。

结论

我们对蛋白质切割方案的数学建模表明，每个结构域测试的最佳片段数量实际上比先验预期的要小得多。我们的模型的应用范围不仅限于蛋白质切割，还可以用于设计各种大规模突变分析或筛选文库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb06/2843616/02dcbe89c900/1471-2105-11-113-1.jpg

相似文献

Mathematical model for empirically optimizing large scale production of soluble protein domains.

BMC Bioinformatics. 2010 Mar 1;11:113. doi: 10.1186/1471-2105-11-113.

Large-scale analysis of thermostable, mammalian proteins provides insights into the intrinsically disordered proteome.

J Proteome Res. 2009 Jan;8(1):211-26. doi: 10.1021/pr800308v.

MOTIPS: automated motif analysis for predicting targets of modular protein domains.

BMC Bioinformatics. 2010 May 11;11:243. doi: 10.1186/1471-2105-11-243.

Identification of putative domain linkers by a neural network - application to a large sequence database.

BMC Bioinformatics. 2006 Jun 27;7:323. doi: 10.1186/1471-2105-7-323.

Identifying the missing proteins in human proteome by biological language model.

BMC Syst Biol. 2016 Dec 23;10(Suppl 4):113. doi: 10.1186/s12918-016-0352-6.

Pinpointing differentially expressed domains in complex protein mixtures with the cloud service of PatternLab for Proteomics.

J Proteomics. 2013 Aug 26;89:179-82. doi: 10.1016/j.jprot.2013.06.013. Epub 2013 Jun 21.

Detection of orphan domains in Drosophila using "hydrophobic cluster analysis".

Biochimie. 2015 Dec;119:244-53. doi: 10.1016/j.biochi.2015.02.019. Epub 2015 Feb 28.

Bayesian data mining of protein domains gives an efficient predictive algorithm and new insight.

J Mol Model. 2007 Jan;13(1):275-82. doi: 10.1007/s00894-006-0141-z. Epub 2006 Oct 7.

Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics.

Protein Sci. 2000 Dec;9(12):2313-21. doi: 10.1110/ps.9.12.2313.

Improvement of domain linker prediction by incorporating loop-length-dependent characteristics.

Biopolymers. 2006;84(2):161-8. doi: 10.1002/bip.20361.

引用本文的文献

Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers.

J Comput Aided Mol Des. 2017 Feb;31(2):237-244. doi: 10.1007/s10822-016-9999-8. Epub 2016 Dec 27.

H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection.

J Comput Aided Mol Des. 2014 Aug;28(8):831-9. doi: 10.1007/s10822-014-9763-x. Epub 2014 Jun 26.

DisMeta: a meta server for construct design and optimization.

Methods Mol Biol. 2014;1091:3-16. doi: 10.1007/978-1-62703-691-7_1.

IS-Dom: a dataset of independent structural domains automatically delineated from protein structures.

J Comput Aided Mol Des. 2013 May;27(5):419-26. doi: 10.1007/s10822-013-9654-6. Epub 2013 May 29.

Preparation of protein samples for NMR structure, function, and small-molecule screening studies.

Methods Enzymol. 2011;493:21-60. doi: 10.1016/B978-0-12-381274-2.00002-9.

The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium.

J Struct Biol. 2010 Oct;172(1):21-33. doi: 10.1016/j.jsb.2010.07.011. Epub 2010 Aug 3.

本文引用的文献

PSI-2: structural genomics to cover protein domain family space.

Structure. 2009 Jun 10;17(6):869-81. doi: 10.1016/j.str.2009.03.015.

Lessons from structural genomics.

Annu Rev Biophys. 2009;38:371-83. doi: 10.1146/annurev.biophys.050708.133740.

Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins.

Proc Natl Acad Sci U S A. 2009 Mar 17;106(11):4201-6. doi: 10.1073/pnas.0811922106. Epub 2009 Feb 27.

In vivo and in vitro protein solubility assays using split GFP.

Nat Methods. 2006 Oct;3(10):845-54. doi: 10.1038/nmeth932.

Identification of putative domain linkers by a neural network - application to a large sequence database.

BMC Bioinformatics. 2006 Jun 27;7:323. doi: 10.1186/1471-2105-7-323.

Protease accessibility laddering: a proteomic tool for probing protein structure.

Structure. 2006 Apr;14(4):653-60. doi: 10.1016/j.str.2006.02.006.

Computer-aided NMR assay for detecting natively folded structural domains.

Protein Sci. 2006 Apr;15(4):871-83. doi: 10.1110/ps.051880406. Epub 2006 Mar 7.

Identification of protein domains by shotgun proteolysis.

J Mol Biol. 2006 Apr 28;358(2):364-71. doi: 10.1016/j.jmb.2006.01.057. Epub 2006 Feb 13.

The impact of structural genomics: expectations and outcomes.

Science. 2006 Jan 20;311(5759):347-51. doi: 10.1126/science.1121018.

The RCSB PDB information portal for structural genomics.

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D302-5. doi: 10.1093/nar/gkj120.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于经验优化可溶性蛋白结构域大规模生产的数学模型。

Mathematical model for empirically optimizing large scale production of soluble protein domains.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献