Suppr超能文献

CSM-毒素:一种用于预测蛋白质毒性的网络服务器。

CSM-Toxin: A Web-Server for Predicting Protein Toxicity.

作者信息

Morozov Vladimir, Rodrigues Carlos H M, Ascher David B

机构信息

School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia.

Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia.

出版信息

Pharmaceutics. 2023 Jan 28;15(2):431. doi: 10.3390/pharmaceutics15020431.

Abstract

Biologics are one of the most rapidly expanding classes of therapeutics, but can be associated with a range of toxic properties. In small-molecule drug development, early identification of potential toxicity led to a significant reduction in clinical trial failures, however we currently lack robust qualitative rules or predictive tools for peptide- and protein-based biologics. To address this, we have manually curated the largest set of high-quality experimental data on peptide and protein toxicities, and developed CSM-Toxin, a novel in-silico protein toxicity classifier, which relies solely on the protein primary sequence. Our approach encodes the protein sequence information using a deep learning natural languages model to understand "biological" language, where residues are treated as words and protein sequences as sentences. The CSM-Toxin was able to accurately identify peptides and proteins with potential toxicity, achieving an MCC of up to 0.66 across both cross-validation and multiple non-redundant blind tests, outperforming other methods and highlighting the robust and generalisable performance of our model. We strongly believe the CSM-Toxin will serve as a valuable platform to minimise potential toxicity in the biologic development pipeline. Our method is freely available as an easy-to-use webserver.

摘要

生物制剂是发展最为迅速的一类治疗药物,但可能具有一系列毒性特性。在小分子药物研发中,早期识别潜在毒性显著减少了临床试验失败的情况,然而目前我们缺乏针对基于肽和蛋白质的生物制剂的可靠定性规则或预测工具。为解决这一问题,我们人工整理了关于肽和蛋白质毒性的最大规模高质量实验数据集,并开发了CSM-Toxin,这是一种全新的基于计算机模拟的蛋白质毒性分类器,它仅依赖于蛋白质一级序列。我们的方法使用深度学习自然语言模型对蛋白质序列信息进行编码,以理解“生物”语言,其中氨基酸残基被视为单词,蛋白质序列被视为句子。CSM-Toxin能够准确识别具有潜在毒性的肽和蛋白质,在交叉验证和多个非冗余盲测中,马修斯相关系数(MCC)高达0.66,优于其他方法,突出了我们模型强大且通用的性能。我们坚信CSM-Toxin将成为一个有价值的平台,以尽量减少生物制剂研发过程中的潜在毒性。我们的方法以易于使用的网络服务器形式免费提供。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验