Suppr超能文献

在组学数据中计算发现生物活性肽的挑战。

Challenges in computational discovery of bioactive peptides in 'omics data.

机构信息

Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Woolloongabba, Queensland, Australia.

Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China.

出版信息

Proteomics. 2024 Jun;24(12-13):e2300105. doi: 10.1002/pmic.202300105. Epub 2024 Mar 8.

Abstract

Peptides have a plethora of activities in biological systems that can potentially be exploited biotechnologically. Several peptides are used clinically, as well as in industry and agriculture. The increase in available 'omics data has recently provided a large opportunity for mining novel enzymes, biosynthetic gene clusters, and molecules. While these data primarily consist of DNA sequences, other types of data provide important complementary information. Due to their size, the approaches proven successful at discovering novel proteins of canonical size cannot be naïvely applied to the discovery of peptides. Peptides can be encoded directly in the genome as short open reading frames (smORFs), or they can be derived from larger proteins by proteolysis. Both of these peptide classes pose challenges as simple methods for their prediction result in large numbers of false positives. Similarly, functional annotation of larger proteins, traditionally based on sequence similarity to infer orthology and then transferring functions between characterized proteins and uncharacterized ones, cannot be applied for short sequences. The use of these techniques is much more limited and alternative approaches based on machine learning are used instead. Here, we review the limitations of traditional methods as well as the alternative methods that have recently been developed for discovering novel bioactive peptides with a focus on prokaryotic genomes and metagenomes.

摘要

肽在生物系统中具有多种活性,这些活性有可能被生物技术利用。有几种肽在临床上、工业和农业中都有应用。随着“组学”数据的增加,最近为挖掘新型酶、生物合成基因簇和分子提供了很大的机会。虽然这些数据主要由 DNA 序列组成,但其他类型的数据提供了重要的补充信息。由于它们的大小,在发现常规大小的新型蛋白质方面被证明成功的方法不能简单地应用于肽的发现。肽可以直接作为短开放阅读框 (smORFs) 编码在基因组中,也可以通过蛋白水解从较大的蛋白质中衍生出来。这两类肽都构成了挑战,因为简单的预测方法会产生大量的假阳性。同样,传统上基于序列相似性来推断同源性,并在有特征的蛋白质和无特征的蛋白质之间转移功能的较大蛋白质的功能注释,不能应用于短序列。这些技术的使用受到了很大的限制,因此转而采用基于机器学习的替代方法。在这里,我们回顾了传统方法的局限性,以及最近为发现新型生物活性肽而开发的替代方法,重点是原核基因组和宏基因组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f0b/11537280/4aa171201315/nihms-2008712-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验