Suppr超能文献

splicedFamAlign:CDS 到基因拼接对齐和转录本同源物组的鉴定。

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups.

机构信息

Department of Computer science, Faculty of Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada.

Department of Biochemistry, Faculty of medecine and health science, Université de Sherbrooke, Sherbrooke, Quebec, Canada.

出版信息

BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):133. doi: 10.1186/s12859-019-2647-2.

Abstract

BACKGROUND

The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments.

RESULTS

The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments.

CONCLUSION

We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign .

摘要

背景

推断基因转录本的剪接同源关系是预测转录本和注释基因组中基因结构的基本步骤。序列的剪接结构是指 CDS 中的外显子极限信息或基因序列中外显子-内含子极限信息。剪接同源 CDS 是指来自同源基因的具有相似序列和保守剪接结构的 CDS 对。由拼接 cDNA 序列与未拼接基因组序列比对组成的拼接比对,是识别剪接同源关系的一种很有前途但尚未探索的方法。现有的拼接比对算法没有利用输入序列的剪接结构信息,即 cDNA 序列的外显子结构和基因组序列的外显子-内含子结构。然而,这些信息通常可用于数据库中注释的编码 DNA 序列 (CDS) 和基因序列,并且可以帮助提高计算拼接比对的准确性。为了解决这个问题,我们引入了一个新的拼接比对问题和一种名为 SplicedFamAlign (SFA) 的方法,用于在考虑输入序列的剪接结构的情况下计算拼接 CDS 与基因序列的比对,然后基于拼接比对推断基因家族中的转录物剪接同源群。

结果

实验结果表明,SFA 在 CDS 与基因比对的准确性和执行时间方面优于现有的拼接比对方法。我们还表明,由于考虑了输入序列的剪接结构,SFA 在输入序列之间具有各种相似性水平的情况下仍然保持较高的性能。需要注意的是,与所有当前用于 cDNA 与基因组比对且可用于 CDS 与基因比对的拼接比对方法不同,SFA 是专门为 CDS 与基因比对设计的第一种方法。

结论

我们展示了 SFA 用于分析剪接同源性的基因家族内基因和转录本比较的有用性。它还可用于基因结构注释和选择性剪接分析。SplicedFamAlign 是用 Python 实现的。源代码可在 https://github.com/UdeS-CoBIUS/SpliceFamAlign 上免费获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验