• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

识别元素基因组轨迹类型并统一表示它们。

Identifying elemental genomic track types and representing them uniformly.

机构信息

Department of Tumor Biology, The Norwegian Radium Hospital, Oslo University Hospital, Montebello, 0310 Oslo, Norway.

出版信息

BMC Bioinformatics. 2011 Dec 30;12:494. doi: 10.1186/1471-2105-12-494.

DOI:10.1186/1471-2105-12-494
PMID:22208806
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3315820/
Abstract

BACKGROUND

With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated.

RESULTS

We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0.

CONCLUSIONS

The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.

摘要

背景

随着各种高通量测序技术的最新进展和可用性,越来越多的实验室正在快速生成大量关于基因调控、染色质动力学和 DNA 三维结构等分子方面的数据。生物背景的变化,以及数据生成模式的日益分散,意味着需要通过易于解析的格式,精确、可互操作和灵活地表示基因组特征。目前有许多替代格式可用并在使用,这使得分析和工具开发变得复杂。格式是否以及如何反映数据的多种潜在特征,这个问题据我们所知,以前没有被系统地处理过。

结果

我们在这里确定了基因组特征之间的内在区别,并认为这些区别意味着需要对特征作为基因组轨迹的表示进行一定的变化。讨论了轨迹的四个核心信息属性:间隙、长度、值和连接。由此我们划定了十五种通用轨迹类型。基于轨迹类型的区别,我们对主要的现有表示格式进行了特征描述,发现没有任何单一格式能够充分支持所有的轨迹类型。与 XML 格式相反,我们还发现,现有的任何制表符格式都不方便扩展以支持所有的轨迹类型。因此,我们提出了两种用于轨迹数据的统一格式,即改进的 XML 格式 BioXSD 1.1 和新的制表符格式 GTrack 1.0。

结论

所定义的轨迹类型被证明可以捕获基因组注释轨迹之间的相关区别,从而产生不同的表示需求和分析可能性。所提出的格式 GTrack 1.0 和 BioXSD 1.1 满足所确定的轨迹区别,并强调精确性、灵活性和解析方便性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/b320e185c9ab/1471-2105-12-494-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/5257a8db3b7d/1471-2105-12-494-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/b369e5a42d85/1471-2105-12-494-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/66f4a7465100/1471-2105-12-494-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/ae5be9ece7e3/1471-2105-12-494-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/b320e185c9ab/1471-2105-12-494-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/5257a8db3b7d/1471-2105-12-494-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/b369e5a42d85/1471-2105-12-494-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/66f4a7465100/1471-2105-12-494-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/ae5be9ece7e3/1471-2105-12-494-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1017/3315820/b320e185c9ab/1471-2105-12-494-5.jpg

相似文献

1
Identifying elemental genomic track types and representing them uniformly.识别元素基因组轨迹类型并统一表示它们。
BMC Bioinformatics. 2011 Dec 30;12:494. doi: 10.1186/1471-2105-12-494.
2
Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.用于综合处理和查询的异构基因组大数据建模与互操作性
Methods. 2016 Dec 1;111:3-11. doi: 10.1016/j.ymeth.2016.09.002. Epub 2016 Sep 13.
3
Recommendations for the FAIRification of genomic track metadata.基因组轨迹元数据的 FAIR 化推荐。
F1000Res. 2021 Apr 1;10. doi: 10.12688/f1000research.28449.1. eCollection 2021.
4
svist4get: a simple visualization tool for genomic tracks from sequencing experiments.svist4get:一个用于测序实验基因组轨迹的简单可视化工具。
BMC Bioinformatics. 2019 Mar 6;20(1):113. doi: 10.1186/s12859-019-2706-8.
5
gSearch: a fast and flexible general search tool for whole-genome sequencing.gSearch:一种快速灵活的全基因组测序通用搜索工具。
Bioinformatics. 2012 Aug 15;28(16):2176-7. doi: 10.1093/bioinformatics/bts358. Epub 2012 Jun 23.
6
The UCSC Genome Browser database: update 2011.加州大学圣克鲁兹分校基因组浏览器数据库:2011年更新
Nucleic Acids Res. 2011 Jan;39(Database issue):D876-82. doi: 10.1093/nar/gkq963. Epub 2010 Oct 18.
7
BioXSD: the common data-exchange format for everyday bioinformatics web services.BioXSD:日常生物信息学 Web 服务的通用数据交换格式。
Bioinformatics. 2010 Sep 15;26(18):i540-6. doi: 10.1093/bioinformatics/btq391.
8
Isomorphic semantic mapping of variant call format (VCF2RDF).变异调用格式的同构语义映射(VCF2RDF)。
Bioinformatics. 2017 Feb 15;33(4):547-548. doi: 10.1093/bioinformatics/btw652.
9
LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor.LOLA:R和Bioconductor中基因组区域集和调控元件的富集分析。
Bioinformatics. 2016 Feb 15;32(4):587-9. doi: 10.1093/bioinformatics/btv612. Epub 2015 Oct 27.
10
Processing genome scale tabular data with wormtable.使用 wormtable 处理基因组规模的表格数据。
BMC Bioinformatics. 2013 Dec 5;14:356. doi: 10.1186/1471-2105-14-356.

引用本文的文献

1
HiCognition: a visual exploration and hypothesis testing tool for 3D genomics.HiCognition:用于 3D 基因组学的可视化探索和假设检验工具。
Genome Biol. 2023 Jul 5;24(1):158. doi: 10.1186/s13059-023-02996-9.
2
Recommendations for the FAIRification of genomic track metadata.基因组轨迹元数据的 FAIR 化推荐。
F1000Res. 2021 Apr 1;10. doi: 10.12688/f1000research.28449.1. eCollection 2021.
3
Tasks, Techniques, and Tools for Genomic Data Visualization.基因组数据可视化的任务、技术和工具。

本文引用的文献

1
Tabix: fast retrieval of sequence features from generic TAB-delimited files.Tabix:从通用制表符分隔文件中快速检索序列特征。
Bioinformatics. 2011 Mar 1;27(5):718-9. doi: 10.1093/bioinformatics/btq671. Epub 2011 Jan 5.
2
The Genomic HyperBrowser: inferential genomics at the sequence level.基因组超浏览器:序列水平的推理基因组学。
Genome Biol. 2010;11(12):R121. doi: 10.1186/gb-2010-11-12-r121. Epub 2010 Dec 23.
3
BioXSD: the common data-exchange format for everyday bioinformatics web services.BioXSD:日常生物信息学 Web 服务的通用数据交换格式。
Comput Graph Forum. 2019 Jun;38(3):781-805. doi: 10.1111/cgf.13727. Epub 2019 Jul 10.
4
Computational 3D genome modeling using Chrom3D.使用 Chrom3D 进行计算三维基因组建模。
Nat Protoc. 2018 May;13(5):1137-1152. doi: 10.1038/nprot.2018.009. Epub 2018 Apr 26.
5
GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome.GSuite HyperBrowser:跨基因组和表观基因组数据集集合的综合分析。
Gigascience. 2017 Jul 1;6(7):1-12. doi: 10.1093/gigascience/gix032.
6
In the loop: promoter-enhancer interactions and bioinformatics.循环中:启动子-增强子相互作用与生物信息学
Brief Bioinform. 2016 Nov;17(6):980-995. doi: 10.1093/bib/bbv097. Epub 2015 Nov 19.
7
ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.ClusTrack:用于全基因组数据集聚类的特征提取与相似性度量
PLoS One. 2015 Apr 16;10(4):e0123261. doi: 10.1371/journal.pone.0123261. eCollection 2015.
8
Transcriptionally active regions are the preferred targets for chromosomal HPV integration in cervical carcinogenesis.转录活性区域是宫颈癌发生过程中染色体人乳头瘤病毒(HPV)整合的首选靶点。
PLoS One. 2015 Mar 20;10(3):e0119566. doi: 10.1371/journal.pone.0119566. eCollection 2015.
9
HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization.HiBrowse:全基因组染色质 3D 组织的多功能统计分析。
Bioinformatics. 2014 Jun 1;30(11):1620-2. doi: 10.1093/bioinformatics/btu082. Epub 2014 Feb 7.
10
Integrating multiple oestrogen receptor alpha ChIP studies: overlap with disease susceptibility regions, DNase I hypersensitivity peaks and gene expression.整合多个雌激素受体α ChIP 研究:与疾病易感性区域、DNase I 超敏峰和基因表达的重叠。
BMC Med Genomics. 2013 Oct 30;6:45. doi: 10.1186/1755-8794-6-45.
Bioinformatics. 2010 Sep 15;26(18):i540-6. doi: 10.1093/bioinformatics/btq391.
4
A standard variation file format for human genome sequences.人类基因组序列的标准变异文件格式。
Genome Biol. 2010;11(8):R88. doi: 10.1186/gb-2010-11-8-r88. Epub 2010 Aug 26.
5
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.Galaxy:一种支持生命科学领域可访问、可重现和透明计算研究的综合方法。
Genome Biol. 2010;11(8):R86. doi: 10.1186/gb-2010-11-8-r86. Epub 2010 Aug 25.
6
BigWig and BigBed: enabling browsing of large distributed datasets.BigWig 和 BigBed:支持浏览大型分布式数据集。
Bioinformatics. 2010 Sep 1;26(17):2204-7. doi: 10.1093/bioinformatics/btq351. Epub 2010 Jul 17.
7
The Genomedata format for storing large-scale functional genomics data.Genomedata 格式用于存储大规模功能基因组学数据。
Bioinformatics. 2010 Jun 1;26(11):1458-9. doi: 10.1093/bioinformatics/btq164. Epub 2010 Apr 29.
8
Galaxy: a web-based genome analysis tool for experimentalists.Galaxy:一款面向实验人员的基于网络的基因组分析工具。
Curr Protoc Mol Biol. 2010 Jan;Chapter 19:Unit 19.10.1-21. doi: 10.1002/0471142727.mb1910s89.
9
ELM: the status of the 2010 eukaryotic linear motif resource.ELM:2010 年真核线性基序资源的现状。
Nucleic Acids Res. 2010 Jan;38(Database issue):D167-80. doi: 10.1093/nar/gkp1016. Epub 2009 Nov 17.
10
The Universal Protein Resource (UniProt) in 2010.2010 年的通用蛋白质资源(UniProt)。
Nucleic Acids Res. 2010 Jan;38(Database issue):D142-8. doi: 10.1093/nar/gkp846. Epub 2009 Oct 20.