Suppr超能文献

生物序列分类:数据与通用方法综述

Biological Sequence Classification: A Review on Data and General Methods.

作者信息

Ao Chunyan, Jiao Shihu, Wang Yansu, Yu Liang, Zou Quan

机构信息

School of Computer Science and Technology, Xidian University, Xi'an, China.

Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.

出版信息

Research (Wash D C). 2022 Dec 19;2022:0011. doi: 10.34133/research.0011. eCollection 2022.

Abstract

With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website (http://lab.malab.cn/~acy/BioseqData/home.html), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.

摘要

随着生物技术的快速发展,生物序列的数量呈指数级增长。生物序列数据的不断扩展推动了机器学习在生物序列中的应用,以构建用于挖掘生物序列信息的预测模型。生物序列分类研究有许多分支。在本综述中,我们主要关注基于机器学习的生物序列的功能和修饰分类。基于序列的预测和分析是理解DNA、RNA、蛋白质和肽的生物学功能的基本任务。然而,针对生物序列开发了数百种分类模型,乍一看,这些千差万别的具体方法令人眼花缭乱。在此,我们旨在建立一个长期支持网站(http://lab.malab.cn/~acy/BioseqData/home.html),为读者提供分类方法的详细信息以及相关数据集的下载链接。我们简要介绍为生物序列数据构建有效模型框架的步骤。此外,还简要介绍了单细胞测序数据分析方法及其在生物学中的应用。最后,我们讨论了生物序列分类研究当前面临的挑战和未来前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/137e/11404319/977bef24eedb/research.0011.fig.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验