Suppr超能文献

从两两比对到多重剪接比对。

From pairwise to multiple spliced alignment.

作者信息

Jammali Safa, Djossou Abigaïl, Ouédraogo Wend-Yam D D, Nevers Yannis, Chegrane Ibrahim, Ouangraoua Aïda

机构信息

Département D'informatique, Faculté des Sciences, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke (Québec) J1K 2R1, Canada.

Département de Biochimie et de Génomique Fonctionnelle, Faculté de Médecine et des Sciences de la santé, Université de Sherbrooke, 3001, 12e avenue Nord, Sherbrooke (Québec) J1H 5N4, Canada.

出版信息

Bioinform Adv. 2022 Jan 5;2(1):vbab044. doi: 10.1093/bioadv/vbab044. eCollection 2022.

Abstract

MOTIVATION

Alternative splicing is a ubiquitous process in eukaryotes that allows distinct transcripts to be produced from the same gene. Yet, the study of transcript evolution within a gene family is still in its infancy. One prerequisite for this study is the availability of methods to compare sets of transcripts while accounting for their splicing structure. In this context, we generalize the concept of pairwise spliced alignments (PSpAs) to multiple spliced alignments (MSpAs). MSpAs have several important purposes in addition to empowering the study of the evolution of transcripts. For instance, it is a key to improving the prediction of gene models, which is important to solve the growing problem of genome annotation. Despite its essentialness, a formal definition of the concept and methods to compute MSpAs are still lacking.

RESULTS

We introduce the MSpA problem and the SplicedFamAlignMulti (SFAM) method, to compute the MSpA of a gene family. Like most multiple sequence alignment (MSA) methods that are generally greedy heuristic methods assembling pairwise alignments, SFAM combines all PSpAs of coding DNA sequences and gene sequences of a gene family into an MSpA. It produces a single structure that represents the superstructure and models of the gene family. Using real vertebrate and simulated gene family data, we illustrate the utility of SFAM for computing accurate gene family superstructures, MSAs, inferring splicing orthologous groups and improving gene-model annotations.

AVAILABILITY AND IMPLEMENTATION

The supporting data and implementation of SFAM are freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlignMulti.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

可变剪接是真核生物中普遍存在的过程,它使得同一基因能够产生不同的转录本。然而,基因家族中转录本进化的研究仍处于起步阶段。这项研究的一个先决条件是要有能够在考虑转录本剪接结构的同时比较转录本集合的方法。在此背景下,我们将双序列剪接比对(PSpA)的概念推广到多序列剪接比对(MSpA)。除了有助于研究转录本的进化外,MSpA还有几个重要用途。例如,它是改进基因模型预测的关键,而这对于解决日益严重的基因组注释问题至关重要。尽管其至关重要,但MSpA概念的正式定义以及计算MSpA的方法仍然缺失。

结果

我们引入了MSpA问题和SplicedFamAlignMulti(SFAM)方法来计算基因家族的MSpA。与大多数通常是组装双序列比对的贪婪启发式方法的多序列比对(MSA)方法一样,SFAM将基因家族的编码DNA序列和基因序列的所有PSpA组合成一个MSpA。它产生一个单一结构,代表基因家族的上层结构和模型。使用真实的脊椎动物和模拟基因家族数据,我们展示了SFAM在计算准确的基因家族上层结构、MSA、推断剪接直系同源组以及改进基因模型注释方面的效用。

可用性和实现方式

SFAM的支持数据和实现可在https://github.com/UdeS-CoBIUS/SpliceFamAlignMulti上免费获取。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c59/9710695/1466bd11d034/vbab044f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验