Langa Jorge, Estonba Andone, Conklin Darrell
Department of Genetics, Physical Anthropology and Animal Physiology Faculty of Science and Technology University of the Basque Country Leioa Spain.
Department of Computer Science and Artificial Intelligence, Faculty of Computer Science University of the Basque Country UPV/EHU San Sebastián Spain.
Ecol Evol. 2020 Jul 28;10(16):8880-8893. doi: 10.1002/ece3.6587. eCollection 2020 Aug.
For population genetic studies in nonmodel organisms, it is important to use every single source of genomic information. This paper presents EXFI, a Python pipeline that predicts the splice graph and exon sequences using an assembled transcriptome and raw whole-genome sequencing reads. The main algorithm uses Bloom filters to remove reads that are not part of the transcriptome, to predict the intron-exon boundaries, to then proceed to call exons from the assembly, and to generate the underlying splice graph. The results are returned in GFA1 format, which encodes both the predicted exon sequences and how they are connected to form transcripts. EXFI is written in Python, tested on Linux platforms, and the source code is available under the MIT License at https://github.com/jlanga/exfi.
对于非模式生物的群体遗传学研究,利用每一个基因组信息来源非常重要。本文介绍了EXFI,这是一个用Python编写的流程,它使用组装好的转录组和原始全基因组测序读数来预测剪接图和外显子序列。主要算法使用布隆过滤器去除不属于转录组的读数,预测内含子-外显子边界,然后从组装结果中调用外显子,并生成潜在的剪接图。结果以GFA1格式返回,该格式编码预测的外显子序列以及它们如何连接形成转录本。EXFI用Python编写,在Linux平台上进行了测试,其源代码可在https://github.com/jlanga/exfi上根据麻省理工学院许可获取。