Suppr超能文献

TIS 变压器:使用深度学习重新映射人类蛋白质组。

TIS Transformer: remapping the human proteome using deep learning.

作者信息

Clauwaert Jim, McVey Zahra, Gupta Ramneek, Menschaert Gerben

机构信息

Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Oost-Vlaanderen 9000, Belgium.

Novo Nordisk Research Centre Oxford, Novo Nordisk Ltd., Crawley, South East England, RH6 0PA, UK.

出版信息

NAR Genom Bioinform. 2023 Mar 3;5(1):lqad021. doi: 10.1093/nargab/lqad021. eCollection 2023 Mar.

Abstract

The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by experiments. Here, we propose TIS Transformer, a deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence. The method is built upon deep learning techniques first designed for natural language processing. We prove this approach to be best suited for learning the semantics of translation, outperforming previous approaches by a large margin. We demonstrate that limitations in the model performance are primarily due to the presence of low-quality annotations against which the model is evaluated against. Advantages of the method are its ability to detect key features of the translation process and multiple coding sequences on a transcript. These include micropeptides encoded by short Open Reading Frames, either alongside a canonical coding sequence or within long non-coding RNAs. To demonstrate the use of our methods, we applied TIS Transformer to remap the full human proteome.

摘要

蛋白质组的正确映射是推动我们对生物系统和细胞机制理解的重要一步。提供更好映射的方法可以推动药物发现和疾病理解等重要进程。目前,翻译起始位点的准确确定主要通过实验来实现。在此,我们提出了TIS Transformer,这是一种深度学习模型,仅利用转录本核苷酸序列中嵌入的信息来确定翻译起始位点。该方法基于最初为自然语言处理设计的深度学习技术构建。我们证明这种方法最适合学习翻译语义,比以前的方法有大幅提升。我们表明模型性能的限制主要是由于用于评估模型的低质量注释的存在。该方法的优点是能够检测转录本上翻译过程的关键特征和多个编码序列。这些包括由短开放阅读框编码的微肽,它们要么与经典编码序列一起,要么存在于长链非编码RNA中。为了展示我们方法的用途,我们应用TIS Transformer对完整的人类蛋白质组进行重新映射。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/05df/9985340/7d3516589018/lqad021fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验