Suppr超能文献

机器学习算法在DNA序列数据挖掘中的应用综述

Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA.

作者信息

Yang Aimin, Zhang Wei, Wang Jiahao, Yang Ke, Han Yang, Zhang Limin

机构信息

College of Science, North China University of Science and Technology, Tangshan, China.

College of Yi Sheng, North China University of Science and Technology, Tangshan, China.

出版信息

Front Bioeng Biotechnol. 2020 Sep 4;8:1032. doi: 10.3389/fbioe.2020.01032. eCollection 2020.

Abstract

Deoxyribonucleic acid (DNA) is a biological macromolecule. Its main function is information storage. At present, the advancement of sequencing technology had caused DNA sequence data to grow at an explosive rate, which has also pushed the study of DNA sequences in the wave of big data. Moreover, machine learning is a powerful technique for analyzing largescale data and learns spontaneously to gain knowledge. It has been widely used in DNA sequence data analysis and obtained a lot of research achievements. Firstly, the review introduces the development process of sequencing technology, expounds on the concept of DNA sequence data structure and sequence similarity. Then we analyze the basic process of data mining, summary several major machine learning algorithms, and put forward the challenges faced by machine learning algorithms in the mining of biological sequence data and possible solutions in the future. Then we review four typical applications of machine learning in DNA sequence data: DNA sequence alignment, DNA sequence classification, DNA sequence clustering, and DNA pattern mining. We analyze their corresponding biological application background and significance, and systematically summarized the development and potential problems in the field of DNA sequence data mining in recent years. Finally, we summarize the content of the review and look into the future of some research directions for the next step.

摘要

脱氧核糖核酸(DNA)是一种生物大分子。其主要功能是信息存储。目前,测序技术的进步使得DNA序列数据呈爆炸式增长,这也推动了DNA序列研究进入大数据浪潮。此外,机器学习是一种用于分析大规模数据并能自发学习以获取知识的强大技术。它已被广泛应用于DNA序列数据分析并取得了许多研究成果。首先,本文综述介绍了测序技术的发展历程,阐述了DNA序列数据结构和序列相似性的概念。然后我们分析了数据挖掘的基本过程,总结了几种主要的机器学习算法,并提出了机器学习算法在生物序列数据挖掘中面临的挑战以及未来可能的解决方案。接着我们综述了机器学习在DNA序列数据中的四个典型应用:DNA序列比对、DNA序列分类、DNA序列聚类和DNA模式挖掘。我们分析了它们相应的生物学应用背景和意义,并系统总结了近年来DNA序列数据挖掘领域的发展情况及潜在问题。最后,我们总结了综述的内容并展望了下一步一些研究方向的未来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69a1/7498545/b6a363753e1c/fbioe-08-01032-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验