Suppr超能文献

用于鉴定X和Y染色体中重复序列的新方法。

New methodology for repetitive sequences identification in X and Y chromosomes.

作者信息

Touati Rabeb, Tajouri Asma, Mesaoudi Imen, Oueslati Afef Elloumi, Lachiri Zied, Kharrat Maher

机构信息

University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia.

University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia.

出版信息

Biomed Signal Process Control. 2021 Feb;64:102207. doi: 10.1016/j.bspc.2020.102207. Epub 2020 Oct 19.

Abstract

Repetitive DNA sequences occupy the major proportion of DNA in the human genome and even in the other species' genomes. The importance of each repetitive DNA type depends on many factors: structural and functional roles, positions, lengths and numbers of these repetitions are clear examples. Conserving such DNA sequences or not in different locations in the chromosome remains a challenge for researchers in biology. Detecting their location despite their great variability and finding novel repetitive sequences remains a challenging task. To side-step this problem, we developed a new method based on signal and image processing tools. In fact, using this method we could find repetitive patterns in DNA images regardless of the repetition length. This new technique seems to be more efficient in detecting new repetitive sequences than bioinformatics tools. In fact, the classical tools present limited performances especially in case of mutations (insertion or deletion). However, modifying one or a few numbers of pixels in the image doesn't affect the global form of the repetitive pattern. As a consequence, we generated a new repetitive patterns database which contains tandem and dispersed repeated sequences. The highly repetitive sequences, we have identified in X and Y chromosomes, are shown to be located in other human chromosomes or in other genomes. The data we have generated is then taken as input to a Convolutional neural network classifier in order to classify them. The system we have constructed is efficient and gives an average of 94.4% as recognition score.

摘要

重复DNA序列在人类基因组乃至其他物种的基因组中占据了DNA的主要比例。每种重复DNA类型的重要性取决于许多因素:结构和功能作用、这些重复序列的位置、长度和数量就是明显的例子。在染色体的不同位置是否保留此类DNA序列,对生物学研究人员来说仍然是一项挑战。尽管其具有很大的变异性,但检测它们的位置并发现新的重复序列仍然是一项具有挑战性的任务。为了避开这个问题,我们基于信号和图像处理工具开发了一种新方法。事实上,使用这种方法,我们可以在DNA图像中找到重复模式,而不管重复长度如何。这项新技术在检测新的重复序列方面似乎比生物信息学工具更有效。事实上,传统工具的性能有限,尤其是在发生突变(插入或缺失)的情况下。然而,在图像中修改一个或几个像素不会影响重复模式的整体形式。因此,我们生成了一个新的重复模式数据库,其中包含串联和分散的重复序列。我们在X和Y染色体中鉴定出的高度重复序列,也存在于其他人类染色体或其他基因组中。然后,我们将生成的数据作为卷积神经网络分类器的输入,以便对它们进行分类。我们构建的系统效率很高,识别分数平均为94.4%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/071b/7572123/5c5de8105248/fx1_lrg.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验