Sauvage Thomas, Plouviez Sophie, Schmidt William E, Fredericq Suzanne
Department of Biology, University of Louisiana at Lafayette, 410 E. Saint Mary Boulevard, Lafayette, LA, 70503, USA.
Smithsonian Marine Station, 701 Seaway Drive, Fort Pierce, FL, 34949, USA.
BMC Res Notes. 2018 Mar 5;11(1):164. doi: 10.1186/s13104-018-3268-y.
The body of DNA sequence data lacking taxonomically informative sequence headers is rapidly growing in user and public databases (e.g. sequences lacking identification and contaminants). In the context of systematics studies, sorting such sequence data for taxonomic curation and/or molecular diversity characterization (e.g. crypticism) often requires the building of exploratory phylogenetic trees with reference taxa. The subsequent step of segregating DNA sequences of interest based on observed topological relationships can represent a challenging task, especially for large datasets.
We have written TREE2FASTA, a Perl script that enables and expedites the sorting of FASTA-formatted sequence data from exploratory phylogenetic trees. TREE2FASTA takes advantage of the interactive, rapid point-and-click color selection and/or annotations of tree leaves in the popular Java tree-viewer FigTree to segregate groups of FASTA sequences of interest to separate files. TREE2FASTA allows for both simple and nested segregation designs to facilitate the simultaneous preparation of multiple data sets that may overlap in sequence content.
在用户数据库和公共数据库中,缺乏分类学信息序列标题的DNA序列数据量正在迅速增长(例如,缺乏识别信息的序列和污染物序列)。在系统学研究中,为了进行分类整理和/或分子多样性表征(如隐秘性)而对这类序列数据进行分类,通常需要构建包含参考分类单元的探索性系统发育树。基于观察到的拓扑关系分离感兴趣的DNA序列这一后续步骤可能是一项具有挑战性的任务,尤其是对于大型数据集而言。
我们编写了TREE2FASTA,这是一个Perl脚本,可实现并加速从探索性系统发育树中对FASTA格式的序列数据进行分类。TREE2FASTA利用流行的Java树查看器FigTree中对树叶的交互式、快速点击颜色选择和/或注释,将感兴趣的FASTA序列组分离到不同文件中。TREE2FASTA允许简单和嵌套的分类设计,以方便同时准备多个序列内容可能重叠的数据集。