Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA.
Bioinformatics. 2021 Nov 5;37(21):3920-3922. doi: 10.1093/bioinformatics/btab684.
An abundance of new reference genomes is becoming available through large-scale sequencing efforts. While the reference FASTA for each genome is available, there is currently no automated mechanism to query a specific sequence across all new reference genomes.
We developed ACES (Analysis of Conservation with an Extensive list of Species) as a computational workflow to query specific sequences of interest (e.g. enhancers, promoters, exons) against reference genomes with an available reference FASTA. This automated workflow generates BLAST hits against each of the reference genomes, a multiple sequence alignment file, a graphical fragment assembly file and a phylogenetic tree file. These data files can then be used by the researcher in several ways to provide key insights into conservation of the query sequence.
ACES is available at https://github.com/TNTurnerLab/ACES.
Supplementary data are available at Bioinformatics online.
通过大规模测序工作,大量新的参考基因组可供使用。虽然每个基因组的参考 FASTA 都可用,但目前还没有自动机制可以在所有新的参考基因组中查询特定的序列。
我们开发了 ACES(利用广泛的物种列表进行保守性分析)作为一种计算工作流程,用于针对具有可用参考 FASTA 的参考基因组查询特定的感兴趣序列(例如增强子、启动子、外显子)。该自动化工作流程会针对每个参考基因组生成 BLAST 命中,一个多序列对齐文件,一个图形片段组装文件和一个系统发育树文件。然后,研究人员可以通过多种方式使用这些数据文件,为查询序列的保守性提供关键见解。
ACES 可在 https://github.com/TNTurnerLab/ACES 上获得。
补充数据可在 Bioinformatics 在线获得。