splicedFamAlign：CDS 到基因拼接对齐和转录本同源物组的鉴定。

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups.

机构信息

Department of Computer science, Faculty of Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada.

Department of Biochemistry, Faculty of medecine and health science, Université de Sherbrooke, Sherbrooke, Quebec, Canada.

出版信息

BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):133. doi: 10.1186/s12859-019-2647-2.

DOI:10.1186/s12859-019-2647-2

PMID:30925859

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6439985/

Abstract

BACKGROUND

The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments.

RESULTS

The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments.

CONCLUSION

We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign .

摘要

背景

推断基因转录本的剪接同源关系是预测转录本和注释基因组中基因结构的基本步骤。序列的剪接结构是指 CDS 中的外显子极限信息或基因序列中外显子-内含子极限信息。剪接同源 CDS 是指来自同源基因的具有相似序列和保守剪接结构的 CDS 对。由拼接 cDNA 序列与未拼接基因组序列比对组成的拼接比对，是识别剪接同源关系的一种很有前途但尚未探索的方法。现有的拼接比对算法没有利用输入序列的剪接结构信息，即 cDNA 序列的外显子结构和基因组序列的外显子-内含子结构。然而，这些信息通常可用于数据库中注释的编码 DNA 序列 (CDS) 和基因序列，并且可以帮助提高计算拼接比对的准确性。为了解决这个问题，我们引入了一个新的拼接比对问题和一种名为 SplicedFamAlign (SFA) 的方法，用于在考虑输入序列的剪接结构的情况下计算拼接 CDS 与基因序列的比对，然后基于拼接比对推断基因家族中的转录物剪接同源群。

结果

实验结果表明，SFA 在 CDS 与基因比对的准确性和执行时间方面优于现有的拼接比对方法。我们还表明，由于考虑了输入序列的剪接结构，SFA 在输入序列之间具有各种相似性水平的情况下仍然保持较高的性能。需要注意的是，与所有当前用于 cDNA 与基因组比对且可用于 CDS 与基因比对的拼接比对方法不同，SFA 是专门为 CDS 与基因比对设计的第一种方法。

结论

我们展示了 SFA 用于分析剪接同源性的基因家族内基因和转录本比较的有用性。它还可用于基因结构注释和选择性剪接分析。SplicedFamAlign 是用 Python 实现的。源代码可在 https://github.com/UdeS-CoBIUS/SpliceFamAlign 上免费获取。

相似文献

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups. splicedFamAlign：CDS 到基因拼接对齐和转录本同源物组的鉴定。

BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):133. doi: 10.1186/s12859-019-2647-2.

SimSpliceEvol: alternative splicing-aware simulation of biological sequence evolution.SimSpliceEvol：具有可变剪接意识的生物序列进化模拟。

BMC Bioinformatics. 2019 Dec 17;20(Suppl 20):640. doi: 10.1186/s12859-019-3207-5.

SimSpliceEvol2: alternative splicing-aware simulation of biological sequence evolution and transcript phylogenies.SimSpliceEvol2：具有剪接体识别功能的生物序列进化和转录系统发育模拟。

BMC Bioinformatics. 2024 Jul 11;25(1):235. doi: 10.1186/s12859-024-05853-z.

Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog.鉴定人类、小鼠和犬中具有保守剪接结构和直系同源异构体的基因。

BMC Genomics. 2022 Mar 18;23(1):216. doi: 10.1186/s12864-022-08429-4.

From pairwise to multiple spliced alignment.从两两比对到多重剪接比对。

Bioinform Adv. 2022 Jan 5;2(1):vbab044. doi: 10.1093/bioadv/vbab044. eCollection 2022.

Assessment of orthologous splicing isoforms in human and mouse orthologous genes.评估人类和小鼠同源基因中外显子剪接异构体的同源性。

BMC Genomics. 2010 Oct 1;11:534. doi: 10.1186/1471-2164-11-534.

PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text.PIntron：一种通过模式和文本的最大配对来检测因选择性剪接而导致的基因结构的快速方法。

BMC Bioinformatics. 2012 Apr 12;13 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-13-S5-S2.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.通过基因组DNA与蛋白质序列的剪接比对进行基因结构预测：通过差异剪接位点评分提高准确性。

J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641.

Splign: algorithms for computing spliced alignments with identification of paralogs.Splign：用于计算剪接比对并识别旁系同源物的算法。

Biol Direct. 2008 May 21;3:20. doi: 10.1186/1745-6150-3-20.

引用本文的文献

Quest for Orthologs in the Era of Biodiversity Genomics.生物多样性基因组学时代的同源基因探索。

Genome Biol Evol. 2024 Oct 9;16(10). doi: 10.1093/gbe/evae224.

BMC Bioinformatics. 2024 Jul 11;25(1):235. doi: 10.1186/s12859-024-05853-z.

From pairwise to multiple spliced alignment.从两两比对到多重剪接比对。

Bioinform Adv. 2022 Jan 5;2(1):vbab044. doi: 10.1093/bioadv/vbab044. eCollection 2022.

Detection of orthologous exons and isoforms using EGIO.使用 EGIO 检测直系同源外显子和异构体。

Bioinformatics. 2022 Sep 30;38(19):4474-4480. doi: 10.1093/bioinformatics/btac548.

Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog.鉴定人类、小鼠和犬中具有保守剪接结构和直系同源异构体的基因。

BMC Genomics. 2022 Mar 18;23(1):216. doi: 10.1186/s12864-022-08429-4.

ExceS-A: an exon-centric split aligner.ExceS-A：一种基于外显子的分割比对软件。

J Integr Bioinform. 2022 Mar 7;19(1):20210040. doi: 10.1515/jib-2021-0040.

Ten Years of Collaborative Progress in the Quest for Orthologs.寻找同源基因的十年协同进展。

Mol Biol Evol. 2021 Jul 29;38(8):3033-3045. doi: 10.1093/molbev/msab098.

SimSpliceEvol: alternative splicing-aware simulation of biological sequence evolution.SimSpliceEvol：具有可变剪接意识的生物序列进化模拟。

BMC Bioinformatics. 2019 Dec 17;20(Suppl 20):640. doi: 10.1186/s12859-019-3207-5.

本文引用的文献

Reconstructing protein and gene phylogenies using reconciliation and soft-clustering.利用比对和软聚类重建蛋白质和基因系统发育树。

J Bioinform Comput Biol. 2017 Dec;15(6):1740007. doi: 10.1142/S0219720017400078. Epub 2017 Oct 19.

Assisted transcriptome reconstruction and splicing orthology.辅助转录组重建与剪接直系同源关系。

BMC Genomics. 2016 Nov 11;17(Suppl 10):786. doi: 10.1186/s12864-016-3103-6.

Ensembl 2015.Ensembl 2015.

Nucleic Acids Res. 2015 Jan;43(Database issue):D662-9. doi: 10.1093/nar/gku1010. Epub 2014 Oct 28.

HSA: a heuristic splice alignment tool.HSA：一种启发式剪接比对工具。

BMC Syst Biol. 2013;7 Suppl 2(Suppl 2):S10. doi: 10.1186/1752-0509-7-S2-S10. Epub 2013 Dec 17.

APPRIS: annotation of principal and alternative splice isoforms.APPRIS：主要和替代剪接异构体的注释。

Nucleic Acids Res. 2013 Jan;41(Database issue):D110-7. doi: 10.1093/nar/gks1058. Epub 2012 Nov 17.

Inferring transcript phylogenies.推断转录系统发生。

BMC Bioinformatics. 2012 Jun 11;13 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2105-13-s9-s1.

SOAPsplice: Genome-Wide ab initio Detection of Splice Junctions from RNA-Seq Data.SOAPsplice：基于RNA测序数据从头开始全基因组检测剪接位点

Front Genet. 2011 Jul 7;2:46. doi: 10.3389/fgene.2011.00046. eCollection 2011.

Assessment of orthologous splicing isoforms in human and mouse orthologous genes.评估人类和小鼠同源基因中外显子剪接异构体的同源性。

BMC Genomics. 2010 Oct 1;11:534. doi: 10.1186/1471-2164-11-534.

Splign: algorithms for computing spliced alignments with identification of paralogs.Splign：用于计算剪接比对并识别旁系同源物的算法。

Biol Direct. 2008 May 21;3:20. doi: 10.1186/1745-6150-3-20.

An empirical codon model for protein sequence evolution.一种用于蛋白质序列进化的经验密码子模型。

Mol Biol Evol. 2007 Jul;24(7):1464-79. doi: 10.1093/molbev/msm064. Epub 2007 Mar 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

splicedFamAlign：CDS 到基因拼接对齐和转录本同源物组的鉴定。

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献