• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TREE2FASTA:一个灵活的Perl脚本,用于从探索性系统发育树中批量提取FASTA序列。

TREE2FASTA: a flexible Perl script for batch extraction of FASTA sequences from exploratory phylogenetic trees.

作者信息

Sauvage Thomas, Plouviez Sophie, Schmidt William E, Fredericq Suzanne

机构信息

Department of Biology, University of Louisiana at Lafayette, 410 E. Saint Mary Boulevard, Lafayette, LA, 70503, USA.

Smithsonian Marine Station, 701 Seaway Drive, Fort Pierce, FL, 34949, USA.

出版信息

BMC Res Notes. 2018 Mar 5;11(1):164. doi: 10.1186/s13104-018-3268-y.

DOI:10.1186/s13104-018-3268-y
PMID:29506565
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5838971/
Abstract

OBJECTIVE

The body of DNA sequence data lacking taxonomically informative sequence headers is rapidly growing in user and public databases (e.g. sequences lacking identification and contaminants). In the context of systematics studies, sorting such sequence data for taxonomic curation and/or molecular diversity characterization (e.g. crypticism) often requires the building of exploratory phylogenetic trees with reference taxa. The subsequent step of segregating DNA sequences of interest based on observed topological relationships can represent a challenging task, especially for large datasets.

RESULTS

We have written TREE2FASTA, a Perl script that enables and expedites the sorting of FASTA-formatted sequence data from exploratory phylogenetic trees. TREE2FASTA takes advantage of the interactive, rapid point-and-click color selection and/or annotations of tree leaves in the popular Java tree-viewer FigTree to segregate groups of FASTA sequences of interest to separate files. TREE2FASTA allows for both simple and nested segregation designs to facilitate the simultaneous preparation of multiple data sets that may overlap in sequence content.

摘要

目的

在用户数据库和公共数据库中,缺乏分类学信息序列标题的DNA序列数据量正在迅速增长(例如,缺乏识别信息的序列和污染物序列)。在系统学研究中,为了进行分类整理和/或分子多样性表征(如隐秘性)而对这类序列数据进行分类,通常需要构建包含参考分类单元的探索性系统发育树。基于观察到的拓扑关系分离感兴趣的DNA序列这一后续步骤可能是一项具有挑战性的任务,尤其是对于大型数据集而言。

结果

我们编写了TREE2FASTA,这是一个Perl脚本,可实现并加速从探索性系统发育树中对FASTA格式的序列数据进行分类。TREE2FASTA利用流行的Java树查看器FigTree中对树叶的交互式、快速点击颜色选择和/或注释,将感兴趣的FASTA序列组分离到不同文件中。TREE2FASTA允许简单和嵌套的分类设计,以方便同时准备多个序列内容可能重叠的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eae/5838971/aae398e62c43/13104_2018_3268_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eae/5838971/af69412ea907/13104_2018_3268_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eae/5838971/aae398e62c43/13104_2018_3268_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eae/5838971/af69412ea907/13104_2018_3268_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eae/5838971/aae398e62c43/13104_2018_3268_Fig2_HTML.jpg

相似文献

1
TREE2FASTA: a flexible Perl script for batch extraction of FASTA sequences from exploratory phylogenetic trees.TREE2FASTA:一个灵活的Perl脚本,用于从探索性系统发育树中批量提取FASTA序列。
BMC Res Notes. 2018 Mar 5;11(1):164. doi: 10.1186/s13104-018-3268-y.
2
TaxMan: a taxonomic database manager.TaxMan:一个分类学数据库管理器。
BMC Bioinformatics. 2006 Dec 18;7:536. doi: 10.1186/1471-2105-7-536.
3
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
4
Automated DNA-based plant identification for large-scale biodiversity assessment.自动化 DNA 植物鉴定在大规模生物多样性评估中的应用。
Mol Ecol Resour. 2015 Jan;15(1):136-52. doi: 10.1111/1755-0998.12256. Epub 2014 Apr 12.
5
SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees.SILVA树形查看器:对SILVA系统发育指南树进行交互式网络浏览。
BMC Bioinformatics. 2017 Sep 30;18(1):433. doi: 10.1186/s12859-017-1841-3.
6
Taxonomic colouring of phylogenetic trees of protein sequences.蛋白质序列系统发育树的分类着色。
BMC Bioinformatics. 2006 Feb 17;7:79. doi: 10.1186/1471-2105-7-79.
7
Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes.对2228种蝶形花科豆科植物的GenBank序列进行系统发育超矩阵分析。
Syst Biol. 2006 Oct;55(5):818-36. doi: 10.1080/10635150600999150.
8
CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.CDSbank:蛋白质编码 DNA 或氨基酸序列的分类法感知提取、选择、重命名和格式设置。
BMC Bioinformatics. 2014 Feb 28;15:61. doi: 10.1186/1471-2105-15-61.
9
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.PhyloGena——一个用于对未知序列进行自动系统发育注释的用户友好型系统。
Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1.
10
Phylogeny-aware identification and correction of taxonomically mislabeled sequences.基于系统发育的分类错误标记序列的识别与校正
Nucleic Acids Res. 2016 Jun 20;44(11):5022-33. doi: 10.1093/nar/gkw396. Epub 2016 May 10.

引用本文的文献

1
Whole transcriptome analysis and construction of a ceRNA regulatory network related to leaf and petiole development in Chinese cabbage (Brassica campestris L. ssp. pekinensis).白菜( Brassica campestris L. ssp. pekinensis )叶片和叶柄发育相关的 ceRNA 调控网络的全转录组分析与构建。
BMC Genomics. 2023 Mar 24;24(1):144. doi: 10.1186/s12864-023-09239-y.
2
Complete functional analysis of type IV pilus components of a reemergent plant pathogen reveals neofunctionalization of paralog genes.重新出现的植物病原体 IV 型菌毛成分的完全功能分析揭示了基因的新功能化。
PLoS Pathog. 2023 Feb 13;19(2):e1011154. doi: 10.1371/journal.ppat.1011154. eCollection 2023 Feb.
3

本文引用的文献

1
A metabarcoding framework for facilitated survey of endolithic phototrophs with tufA.一种利用tufA促进对石内光合生物进行调查的宏条形码框架。
BMC Ecol. 2016 Mar 10;16:8. doi: 10.1186/s12898-016-0068-x.
2
PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy.植物参考数据库(PhytoREF):一个具有精确分类法的光合真核生物质体16S rRNA基因参考数据库。
Mol Ecol Resour. 2015 Nov;15(6):1435-45. doi: 10.1111/1755-0998.12401. Epub 2015 Apr 6.
3
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.
Phylogenomic assessment of drug-resistant Mycobacterium tuberculosis strains from Beira, Mozambique.
莫桑比克贝拉地区耐多药结核分枝杆菌菌株的系统基因组学评估。
Tuberculosis (Edinb). 2020 Mar;121:101905. doi: 10.1016/j.tube.2020.101905. Epub 2020 Jan 29.
4
Isolation and Characterization of Anaephenes A-C, Alkylphenols from a Filamentous Cyanobacterium ( Hormoscilla sp., Oscillatoriales).从丝状蓝藻(Hormoscilla sp.,颤藻目)中分离和表征 Anaephenes A-C,烷基酚。
J Nat Prod. 2018 Dec 28;81(12):2716-2721. doi: 10.1021/acs.jnatprod.8b00650. Epub 2018 Nov 29.
RAxML 版本 8:用于系统发育分析和大型系统发育后分析的工具。
Bioinformatics. 2014 May 1;30(9):1312-3. doi: 10.1093/bioinformatics/btu033. Epub 2014 Jan 21.
4
BLAST+: architecture and applications.BLAST+:体系结构与应用。
BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.
5
The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.
6
Cryptic species as a window on diversity and conservation.隐秘物种:窥探生物多样性与保护工作的窗口
Trends Ecol Evol. 2007 Mar;22(3):148-55. doi: 10.1016/j.tree.2006.11.004. Epub 2006 Nov 28.
7
MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE:具有高精度和高吞吐量的多序列比对。
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
8
APE: Analyses of Phylogenetics and Evolution in R language.APE:用R语言进行系统发育与进化分析
Bioinformatics. 2004 Jan 22;20(2):289-90. doi: 10.1093/bioinformatics/btg412.
9
Phylogeny for the faint of heart: a tutorial.
Trends Genet. 2003 Jun;19(6):345-51. doi: 10.1016/S0168-9525(03)00112-4.
10
NEXUS: an extensible file format for systematic information.NEXUS:一种用于系统信息的可扩展文件格式。
Syst Biol. 1997 Dec;46(4):590-621. doi: 10.1093/sysbio/46.4.590.