Liu Chunting, Song Jiangning, Ogata Hiroyuki, Akutsu Tatsuya
Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan.
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.
Bioinformatics. 2022 Nov 30;38(23):5160-5167. doi: 10.1093/bioinformatics/btac671.
N4-methylcytosine (4mC) is an essential kind of epigenetic modification that regulates a wide range of biological processes. However, experimental methods for detecting 4mC sites are time-consuming and labor-intensive. As an alternative, computational methods that are capable of automatically identifying 4mC with data analysis techniques become a reasonable option. A major challenge is how to develop effective methods to fully exploit the complex interactions within the DNA sequences to improve the predictive capability.
In this work, we propose MSNet-4mC, a lightweight neural network building upon convolutional operations with multi-scale receptive fields to perceive cross-element relationships over both short and long ranges of given DNA sequences. With strong imbalances in the number of candidates in different species in mind, we compute and apply class weights in the cross-entropy loss to balance the training process. Extensive benchmarking experiments show that our method achieves a significant performance improvement and outperforms other state-of-the-art methods.
The source code and models are freely available for download at https://github.com/LIU-CT/MSNet-4mC, implemented in Python and supported on Linux and Windows.
Supplementary data are available at Bioinformatics online.
N4-甲基胞嘧啶(4mC)是一种重要的表观遗传修饰,可调节广泛的生物过程。然而,检测4mC位点的实验方法既耗时又费力。作为一种替代方法,能够通过数据分析技术自动识别4mC的计算方法成为一个合理的选择。一个主要挑战是如何开发有效的方法来充分利用DNA序列中的复杂相互作用,以提高预测能力。
在这项工作中,我们提出了MSNet-4mC,这是一种轻量级神经网络,基于具有多尺度感受野的卷积操作构建,以感知给定DNA序列的短程和长程上的跨元素关系。考虑到不同物种中候选数量的强烈不平衡,我们在交叉熵损失中计算并应用类别权重以平衡训练过程。广泛的基准实验表明,我们的方法实现了显著的性能提升,优于其他现有最先进的方法。
源代码和模型可在https://github.com/LIU-CT/MSNet-4mC上免费下载,用Python实现,支持Linux和Windows系统。
补充数据可在《生物信息学》在线获取。