Suppr超能文献

BioSeq-BLM:一个基于生物语言模型分析 DNA、RNA 和蛋白质序列的平台。

BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.

机构信息

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.

Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China.

出版信息

Nucleic Acids Res. 2021 Dec 16;49(22):e129. doi: 10.1093/nar/gkab829.

Abstract

In order to uncover the meanings of 'book of life', 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of 'book of life'. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.

摘要

为了揭示“生命之书”的含义,本研究讨论了 155 种不同的用于 DNA、RNA 和蛋白质序列分析的生物语言模型(BLM),这些模型能够提取“生命之书”的语言特性。我们还将 BLM 扩展为一个名为 BioSeq-BLM 的系统,用于自动表示和分析序列数据。实验结果表明,BioSeq-BLM 生成的预测器与文献中已发表的现有最先进的预测器相比,具有相当甚至明显更好的性能,这表明 BioSeq-BLM 将为基于自然语言处理技术的生物序列分析提供新方法,并为这一非常重要的领域的发展做出贡献。为了帮助读者将 BioSeq-BLM 用于自己的实验,我们建立并发布了相应的网络服务器和独立软件包,可以在 http://bliulab.net/BioSeq-BLM/ 上免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9978/8682797/1ce505dd87f1/gkab829fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验