Division of Health Medical Intelligence, Human Genome Center, the Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan.
J Hum Genet. 2024 Oct;69(10):519-525. doi: 10.1038/s10038-024-01275-0. Epub 2024 Jul 31.
Genomic sequences are traditionally represented as strings of characters: A (adenine), C (cytosine), G (guanine), and T (thymine). However, an alternative approach involves depicting sequence-related information through image representations, such as Chaos Game Representation (CGR) and read pileup images. With rapid advancements in deep learning (DL) methods within computer vision and natural language processing, there is growing interest in applying image-based DL methods to genomic sequence analysis. These methods involve encoding genomic information as images or integrating spatial information from images into the analytical process. In this review, we summarize three typical applications that use image processing with DL models for genome analysis. We examine the utilization and advantages of these image-based approaches.
A(腺嘌呤)、C(胞嘧啶)、G(鸟嘌呤)和 T(胸腺嘧啶)。然而,另一种方法是通过图像表示来描绘序列相关信息,例如混沌游戏表示(CGR)和读取堆积图像。随着计算机视觉和自然语言处理领域中深度学习(DL)方法的快速发展,人们越来越感兴趣地将基于图像的 DL 方法应用于基因组序列分析。这些方法涉及将基因组信息编码为图像,或者将来自图像的空间信息集成到分析过程中。在这篇综述中,我们总结了三个使用图像处理和 DL 模型进行基因组分析的典型应用。我们研究了这些基于图像的方法的利用和优势。