Suppr超能文献

DASSI:用于从DNA序列中识别剪接的差异架构搜索

DASSI: differential architecture search for splice identification from DNA sequences.

作者信息

Moosa Shabir, Amira Prof Abbes, Boughorbel Dr Sabri

机构信息

Department of Systems Biology, SIDRA Medicine, Doha, 26999, Qatar.

Dept. of Computer Science and Engineering, Qatar University, Doha, 2713, Qatar.

出版信息

BioData Min. 2021 Feb 15;14(1):15. doi: 10.1186/s13040-021-00237-y.

Abstract

BACKGROUND

The data explosion caused by unprecedented advancements in the field of genomics is constantly challenging the conventional methods used in the interpretation of the human genome. The demand for robust algorithms over the recent years has brought huge success in the field of Deep Learning (DL) in solving many difficult tasks in image, speech and natural language processing by automating the manual process of architecture design. This has been fueled through the development of new DL architectures. Yet genomics possesses unique challenges that requires customization and development of new DL models.

METHODS

We proposed a new model, DASSI, by adapting a differential architecture search method and applying it to the Splice Site (SS) recognition task on DNA sequences to discover new high-performance convolutional architectures in an automated manner. We evaluated the discovered model against state-of-the-art tools to classify true and false SS in Homo sapiens (Human), Arabidopsis thaliana (Plant), Caenorhabditis elegans (Worm) and Drosophila melanogaster (Fly).

RESULTS

Our experimental evaluation demonstrated that the discovered architecture outperformed baseline models and fixed architectures and showed competitive results against state-of-the-art models used in classification of splice sites. The proposed model - DASSI has a compact architecture and showed very good results on a transfer learning task. The benchmarking experiments of execution time and precision on architecture search and evaluation process showed better performance on recently available GPUs making it feasible to adopt architecture search based methods on large datasets.

CONCLUSIONS

We proposed the use of differential architecture search method (DASSI) to perform SS classification on raw DNA sequences, and discovered new neural network models with low number of tunable parameters and competitive performance compared with manually engineered architectures. We have extensively benchmarked DASSI model with other state-of-the-art models and assessed its computational efficiency. The results have shown a high potential of using automated architecture search mechanism for solving various problems in the field of genomics.

摘要

背景

基因组学领域前所未有的进展引发的数据爆炸,不断挑战着用于解读人类基因组的传统方法。近年来,对强大算法的需求在深度学习(DL)领域取得了巨大成功,通过自动化架构设计的手动过程,解决了图像、语音和自然语言处理中的许多难题。这得益于新的深度学习架构的发展。然而,基因组学具有独特的挑战,需要定制和开发新的深度学习模型。

方法

我们提出了一种新模型DASSI,通过采用差分架构搜索方法并将其应用于DNA序列的剪接位点(SS)识别任务,以自动发现新的高性能卷积架构。我们将发现的模型与最先进的工具进行评估,以对智人(人类)、拟南芥(植物)、秀丽隐杆线虫(蠕虫)和黑腹果蝇(果蝇)中的真假剪接位点进行分类。

结果

我们的实验评估表明,发现的架构优于基线模型和固定架构,并在剪接位点分类中与最先进的模型取得了有竞争力的结果。所提出的模型 - DASSI具有紧凑的架构,并且在迁移学习任务中表现出非常好的结果。架构搜索和评估过程的执行时间和精度的基准测试表明,在最近可用的GPU上具有更好的性能,这使得在大型数据集上采用基于架构搜索的方法变得可行。

结论

我们提出使用差分架构搜索方法(DASSI)对原始DNA序列进行剪接位点分类,并发现了与手动设计的架构相比,可调参数数量少且性能具有竞争力的新神经网络模型。我们用其他最先进的模型对DASSI模型进行了广泛的基准测试,并评估了其计算效率。结果表明,使用自动架构搜索机制解决基因组学领域各种问题具有很大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc3/7885202/e2653e9c0d15/13040_2021_237_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验