Suppr超能文献

MSNet-4mC:学习用于识别DNA N4-甲基胞嘧啶位点的有效多尺度表示。

MSNet-4mC: learning effective multi-scale representations for identifying DNA N4-methylcytosine sites.

作者信息

Liu Chunting, Song Jiangning, Ogata Hiroyuki, Akutsu Tatsuya

机构信息

Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan.

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.

出版信息

Bioinformatics. 2022 Nov 30;38(23):5160-5167. doi: 10.1093/bioinformatics/btac671.

Abstract

MOTIVATION

N4-methylcytosine (4mC) is an essential kind of epigenetic modification that regulates a wide range of biological processes. However, experimental methods for detecting 4mC sites are time-consuming and labor-intensive. As an alternative, computational methods that are capable of automatically identifying 4mC with data analysis techniques become a reasonable option. A major challenge is how to develop effective methods to fully exploit the complex interactions within the DNA sequences to improve the predictive capability.

RESULTS

In this work, we propose MSNet-4mC, a lightweight neural network building upon convolutional operations with multi-scale receptive fields to perceive cross-element relationships over both short and long ranges of given DNA sequences. With strong imbalances in the number of candidates in different species in mind, we compute and apply class weights in the cross-entropy loss to balance the training process. Extensive benchmarking experiments show that our method achieves a significant performance improvement and outperforms other state-of-the-art methods.

AVAILABILITY AND IMPLEMENTATION

The source code and models are freely available for download at https://github.com/LIU-CT/MSNet-4mC, implemented in Python and supported on Linux and Windows.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

N4-甲基胞嘧啶(4mC)是一种重要的表观遗传修饰,可调节广泛的生物过程。然而,检测4mC位点的实验方法既耗时又费力。作为一种替代方法,能够通过数据分析技术自动识别4mC的计算方法成为一个合理的选择。一个主要挑战是如何开发有效的方法来充分利用DNA序列中的复杂相互作用,以提高预测能力。

结果

在这项工作中,我们提出了MSNet-4mC,这是一种轻量级神经网络,基于具有多尺度感受野的卷积操作构建,以感知给定DNA序列的短程和长程上的跨元素关系。考虑到不同物种中候选数量的强烈不平衡,我们在交叉熵损失中计算并应用类别权重以平衡训练过程。广泛的基准实验表明,我们的方法实现了显著的性能提升,优于其他现有最先进的方法。

可用性和实现

源代码和模型可在https://github.com/LIU-CT/MSNet-4mC上免费下载,用Python实现,支持Linux和Windows系统。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验