Suppr超能文献

用于外显子、内含子和CpG岛比较及进化比较的排序k谱核

Ranked k-Spectrum Kernel for Comparative and Evolutionary Comparison of Exons, Introns, and CpG Islands.

作者信息

Lee Sangseon, Lee Taeheon, Noh Yung-Kyun, Kim Sun

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):1174-1183. doi: 10.1109/TCBB.2019.2938949. Epub 2021 Jun 3.

Abstract

MOTIVATION

Existing k-mer based string kernel methods have been successfully used for sequence comparison. However, existing kernel methods have limitations for comparative and evolutionary comparisons of genomes due to the sensitiveness to over-represented k-mers and variable sequence lengths.

RESULTS

In this study, we propose a novel ranked k-spectrum string (RKSS) kernel. 1) RKSS kernel utilizes common k-mer sets across species, named landmarks, that can be used for comparing multiple genomes. 2) Based on the landmarks, we can use ranks of k-mers, rather than frequencies, that can produce more robust distances between genomes. To show the power of RKSS kernel, we conducted two experiments using 10 mammalian species with exon, intron, and CpG island sequences. RKSS kernel reconstructed more consistent evolutionary trees than the k-spectrum string kernel. In the subsequent experiment, for each sequence, kernel distance was calculated from 30 landmarks representing exon, intron, and CpG island sequences of 10 genomes. Based on kernel distances, concordance tests were performed and the result suggested that more information is conserved in CpG islands across species than in introns. In conclusion, our analysis suggests that the relational order, exon CpG island intron, in terms of evolutionary information contents.

摘要

动机

现有的基于k-mer的字符串核方法已成功用于序列比较。然而,由于对过度代表性的k-mer和可变序列长度敏感,现有的核方法在基因组的比较和进化比较方面存在局限性。

结果

在本研究中,我们提出了一种新颖的排序k谱字符串(RKSS)核。1)RKSS核利用跨物种的常见k-mer集,即地标,可用于比较多个基因组。2)基于这些地标,我们可以使用k-mer的排名而不是频率,这可以在基因组之间产生更稳健的距离。为了展示RKSS核的强大功能,我们使用10种哺乳动物的外显子、内含子和CpG岛序列进行了两项实验。RKSS核重建的进化树比k谱字符串核更一致。在随后的实验中,对于每个序列,从代表10个基因组的外显子、内含子和CpG岛序列的30个地标计算核距离。基于核距离进行一致性测试,结果表明跨物种的CpG岛中比内含子中保留了更多信息。总之,我们的分析表明,就进化信息含量而言,存在外显子-CpG岛-内含子的关系顺序。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验