• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

给编辑的信:SeqXML 和 OrthoXML:序列和同源信息标准。

Letter to the editor: SeqXML and OrthoXML: standards for sequence and orthology information.

出版信息

Brief Bioinform. 2011 Sep;12(5):485-8. doi: 10.1093/bib/bbr025. Epub 2011 Jun 11.

DOI:10.1093/bib/bbr025
PMID:21666252
Abstract

There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence records-the input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e.g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.

摘要

在同源物领域,标准的制定非常重要。用户必须应对每个提供者提供的不同的同源物数据表示,而提供者本身必须独立地收集和解析输入序列数据。这些繁琐且重复的过程使得数据比较和集成变得困难。我们设计了两种基于 XML 的格式,即 SeqXML 和 OrthoXML,以解决这些问题。SeqXML 是一种轻量级的序列记录格式,是同源物预测的输入。它存储与典型 FASTA 格式记录相同的序列和元数据,但克服了常见的问题,如标题中的非结构化元数据和错误的序列内容。XML 提供了验证,以防止 FASTA 文件中常见的数据完整性问题。SeqXML 的应用范围很广,不仅限于同源物预测。我们为 BioJava、BioPerl 和 Biopython 提供了读写功能。OrthoXML 的设计目的是以一致和结构化的方式表示来自任何来源的同源物分配,同时满足特定的需求,如评分方案或元信息。对于想要整合来自众多资源的数据的同源物消费者来说,统一的格式特别有价值,例如用于基因注释项目。已经有 61 个生物体的参考蛋白质组以 SeqXML 的形式提供,并且 10 个同源物数据库已经签署了 OrthoXML。整个领域的采用将大大促进序列和同源物信息的交换和质量控制。

相似文献

1
Letter to the editor: SeqXML and OrthoXML: standards for sequence and orthology information.给编辑的信:SeqXML 和 OrthoXML:序列和同源信息标准。
Brief Bioinform. 2011 Sep;12(5):485-8. doi: 10.1093/bib/bbr025. Epub 2011 Jun 11.
2
pep2pro: a new tool for comprehensive proteome data analysis to reveal information about organ-specific proteomes in Arabidopsis thaliana.pep2pro:一种用于全面蛋白质组数据分析的新工具,可揭示拟南芥器官特异性蛋白质组的信息。
Integr Biol (Camb). 2011 Mar;3(3):225-37. doi: 10.1039/c0ib00078g. Epub 2011 Jan 24.
3
InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.InParanoid 7:真核生物直系同源分析的新算法和工具。
Nucleic Acids Res. 2010 Jan;38(Database issue):D196-203. doi: 10.1093/nar/gkp931. Epub 2009 Nov 5.
4
Web-based infectious disease reporting using XML forms.使用XML表单的基于网络的传染病报告。
Int J Med Inform. 2008 Sep;77(9):630-40. doi: 10.1016/j.ijmedinf.2007.10.011. Epub 2007 Dec 3.
5
Construction of a nasopharyngeal carcinoma 2D/MS repository with Open Source XML database--Xindice.利用开源XML数据库——Xindice构建鼻咽癌二维/质谱数据库。
BMC Bioinformatics. 2006 Jan 11;7:13. doi: 10.1186/1471-2105-7-13.
6
GeneTools--application for functional annotation and statistical hypothesis testing.基因工具——用于功能注释和统计假设检验的应用程序。
BMC Bioinformatics. 2006 Oct 24;7:470. doi: 10.1186/1471-2105-7-470.
7
A standardized format for sequence data exchange.序列数据交换的标准化格式。
Protein Seq Data Anal. 1987;1(1):27-39.
8
Proteomics FASTA archive and reference resource.蛋白质组学FASTA存档与参考资源。
Proteomics. 2008 May;8(9):1756-7. doi: 10.1002/pmic.200701194.
9
Value of XML in the implementation of clinical practice guidelines--the issue of content retrieval and presentation.XML在临床实践指南实施中的价值——内容检索与呈现问题
Med Inform Internet Med. 2001 Apr-Jun;26(2):131-46.
10
[Computerization and the importance of information in health system, as in health care resources registry].[计算机化以及信息在卫生系统中的重要性,如在医疗保健资源登记方面]
Acta Med Croatica. 2005;59(3):251-7.

引用本文的文献

1
Quest for Orthologs in the Era of Biodiversity Genomics.生物多样性基因组学时代的同源基因探索。
Genome Biol Evol. 2024 Oct 9;16(10). doi: 10.1093/gbe/evae224.
2
Scripting Analyses of Genomes in Ensembl Plants.Ensembl Plants 中的基因组脚本分析。
Methods Mol Biol. 2022;2443:27-55. doi: 10.1007/978-1-0716-2067-0_2.
3
PhylomeDB V5: an expanding repository for genome-wide catalogues of annotated gene phylogenies.PhylomeDB V5:一个不断扩展的基因组注释基因系统发育目录存储库。
Nucleic Acids Res. 2022 Jan 7;50(D1):D1062-D1068. doi: 10.1093/nar/gkab966.
4
Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes.使用 MinHash 进行可扩展的系统发育分析揭示了可能的真核生物有性生殖基因。
PLoS Comput Biol. 2020 Jul 22;16(7):e1007553. doi: 10.1371/journal.pcbi.1007553. eCollection 2020 Jul.
5
The Quest for Orthologs benchmark service and consensus calls in 2020.2020 年寻找直系同源物基准服务和共识调用。
Nucleic Acids Res. 2020 Jul 2;48(W1):W538-W545. doi: 10.1093/nar/gkaa308.
6
MetaPhOrs 2.0: integrative, phylogeny-based inference of orthology and paralogy across the tree of life.MetaPhOrs 2.0:基于系统发育的综合方法,推断生命之树上的直系同源和旁系同源。
Nucleic Acids Res. 2020 Jul 2;48(W1):W553-W557. doi: 10.1093/nar/gkaa282.
7
BioHackathon 2015: Semantics of data for life sciences and reproducible research.2015 年生物黑客马拉松:生命科学和可重复研究的数据语义学。
F1000Res. 2020 Feb 24;9:136. doi: 10.12688/f1000research.18236.1. eCollection 2020.
8
OMA standalone: orthology inference among public and custom genomes and transcriptomes.OMA 独立版:公共和定制基因组和转录组之间的同源推断。
Genome Res. 2019 Jul;29(7):1152-1163. doi: 10.1101/gr.243212.118. Epub 2019 Jun 24.
9
AYbRAH: a curated ortholog database for yeasts and fungi spanning 600 million years of evolution.AYbRAH:一个经过精心整理的酵母和真菌直系同源物数据库,涵盖了 6 亿年的进化历史。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz022.
10
iHam and pyHam: visualizing and processing hierarchical orthologous groups.iHam 和 pyHam:可视化和处理层次化的直系同源群。
Bioinformatics. 2019 Jul 15;35(14):2504-2506. doi: 10.1093/bioinformatics/bty994.