• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物字符串:一种用于处理大型生物序列的关系型数据库数据类型。

Bio-Strings: A Relational Database Data-Type for Dealing with Large Biosequences.

作者信息

Lifschitz Sergio, Haeusler Edward H, Catanho Marcos, Miranda Antonio B de, Armas Elvismary Molina de, Heine Alexandre, Moreira Sergio G M P, Tristão Cristian

机构信息

Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio), Rio de Janeiro 22451-900, Brazil.

Lab. Genética Molecular de Microrganismos, Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro 21040-900, Brazil.

出版信息

BioTech (Basel). 2022 Jul 30;11(3):31. doi: 10.3390/biotech11030031.

DOI:10.3390/biotech11030031
PMID:35997339
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9472027/
Abstract

DNA sequencers output a large set of very long biological data strings that we should persist in databases rather than basic text file systems. Many different data models and database management systems (DBMS) may deal with both storage and efficiency issues regarding genomic datasets. Specifically, there is a need for handling strings with variable sizes while keeping their biological meaning. Relational database management systems (RDBMS) provide several data types that could be further explored for the genomics context. Besides, they enforce integrity, consistency, and enable good abstractions for more conventional data. We propose the relational text data type to represent and manipulate biological sequences and their derivatives. We present a logical schema for representing the core biological information, which may be inferred from a given biological conceptual data schema and the corresponding function manipulations. We implement and evaluate these stored functions into an actual RDBMS for both efficacy and efficiency. We show that it is possible to enforce basic and complex requirements for the genomic domain. We claim that the well-established relational text data type in RDBMS may appropriately handle the representation and persistency of biological sequences. We base our approach on the idea of domain-specific abstract data types that can store data with semantically defined functions while hiding those details from non-technical end-users.

摘要

DNA测序仪输出大量非常长的生物数据字符串,我们应该将其保存在数据库中,而不是基本的文本文件系统中。许多不同的数据模型和数据库管理系统(DBMS)可以处理有关基因组数据集的存储和效率问题。具体而言,需要处理大小可变的字符串,同时保留其生物学意义。关系数据库管理系统(RDBMS)提供了几种数据类型,可在基因组学背景下进一步探索。此外,它们可确保完整性、一致性,并为更传统的数据提供良好的抽象。我们提出使用关系文本数据类型来表示和处理生物序列及其衍生物。我们提出了一个用于表示核心生物信息的逻辑模式,该模式可从给定的生物概念数据模式和相应的函数操作中推断出来。我们将这些存储函数实现并评估到实际的RDBMS中,以确保有效性和效率。我们表明,对基因组领域执行基本和复杂的要求是可能的。我们声称,RDBMS中成熟的关系文本数据类型可以适当地处理生物序列的表示和持久性。我们的方法基于特定领域抽象数据类型的思想,该类型可以存储具有语义定义函数的数据,同时向非技术终端用户隐藏这些细节。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/f32a23e9b8bf/biotech-11-00031-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/c3a0a41a2c1c/biotech-11-00031-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/632f7e3d0e74/biotech-11-00031-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/60d0f69f6603/biotech-11-00031-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/74fe2e198a80/biotech-11-00031-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/6edd4d2a225e/biotech-11-00031-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/ace78974c676/biotech-11-00031-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/5c0af26821fa/biotech-11-00031-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/c52b13d50f6f/biotech-11-00031-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/b639cc206a11/biotech-11-00031-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/7ccafed726ad/biotech-11-00031-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/de60c631461d/biotech-11-00031-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/fb836f34b9db/biotech-11-00031-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/dccd0c603e49/biotech-11-00031-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/cf493e10e369/biotech-11-00031-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/c1ec57d0535a/biotech-11-00031-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/eb0936195e3b/biotech-11-00031-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/3a869f9c5859/biotech-11-00031-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/1c98427f3749/biotech-11-00031-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/f32a23e9b8bf/biotech-11-00031-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/c3a0a41a2c1c/biotech-11-00031-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/632f7e3d0e74/biotech-11-00031-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/60d0f69f6603/biotech-11-00031-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/74fe2e198a80/biotech-11-00031-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/6edd4d2a225e/biotech-11-00031-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/ace78974c676/biotech-11-00031-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/5c0af26821fa/biotech-11-00031-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/c52b13d50f6f/biotech-11-00031-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/b639cc206a11/biotech-11-00031-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/7ccafed726ad/biotech-11-00031-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/de60c631461d/biotech-11-00031-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/fb836f34b9db/biotech-11-00031-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/dccd0c603e49/biotech-11-00031-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/cf493e10e369/biotech-11-00031-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/c1ec57d0535a/biotech-11-00031-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/eb0936195e3b/biotech-11-00031-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/3a869f9c5859/biotech-11-00031-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/1c98427f3749/biotech-11-00031-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0af3/9472027/f32a23e9b8bf/biotech-11-00031-g019.jpg

相似文献

1
Bio-Strings: A Relational Database Data-Type for Dealing with Large Biosequences.生物字符串:一种用于处理大型生物序列的关系型数据库数据类型。
BioTech (Basel). 2022 Jul 30;11(3):31. doi: 10.3390/biotech11030031.
2
Biological sequences integrated: a relational database approach.整合生物序列:一种关系数据库方法。
Acta Biotheor. 2001;49(3):145-59. doi: 10.1023/a:1011958524279.
3
Practical implications of using non-relational databases to store large genomic data files and novel phenotypes.使用非关系型数据库存储大型基因组数据文件和新型表型的实际意义。
J Anim Breed Genet. 2022 Jan;139(1):100-112. doi: 10.1111/jbg.12644. Epub 2021 Aug 29.
4
5
An alternative database approach for management of SNOMED CT and improved patient data queries.一种用于管理医学系统命名法临床术语(SNOMED CT)及改进患者数据查询的替代数据库方法。
J Biomed Inform. 2015 Oct;57:350-7. doi: 10.1016/j.jbi.2015.08.016. Epub 2015 Aug 21.
6
B-SPID: an object-relational database architecture to store, retrieve, and manipulate neuroimaging data.B-SPID:一种用于存储、检索和处理神经影像数据的对象关系数据库架构。
Hum Brain Mapp. 1999;7(2):136-50. doi: 10.1002/(sici)1097-0193(1999)7:2<136::aid-hbm6>3.0.co;2-f.
7
Evaluation of relational and NoSQL database architectures to manage genomic annotations.用于管理基因组注释的关系型和非关系型数据库架构评估。
J Biomed Inform. 2016 Dec;64:288-295. doi: 10.1016/j.jbi.2016.10.015. Epub 2016 Oct 31.
8
A relational database in neurosurgery.神经外科中的关系数据库。
Medinfo. 1995;8 Pt 1:485.
9
A Chado case study: an ontology-based modular schema for representing genome-associated biological information.一个Chado案例研究:用于表示基因组相关生物信息的基于本体的模块化模式。
Bioinformatics. 2007 Jul 1;23(13):i337-46. doi: 10.1093/bioinformatics/btm189.
10
Representing and querying conceptual graphs with relational database management systems is possible.使用关系数据库管理系统来表示和查询概念图是可行的。
Proc AMIA Symp. 2001:598-602.

引用本文的文献

1
Evolutionary Process Underlying Receptor Gene Expansion and Cellular Divergence of Olfactory Sensory Neurons in Honeybees.蜜蜂嗅觉感觉神经元受体基因扩增与细胞分化的进化过程
Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf080.
2
mRNA stability fine-tunes gene expression in the developing cortex to control neurogenesis.信使核糖核酸稳定性在发育中的皮质中微调基因表达以控制神经发生。
PLoS Biol. 2025 Feb 6;23(2):e3003031. doi: 10.1371/journal.pbio.3003031. eCollection 2025 Feb.
3
A Machine Learning Pipeline to Screen Large In Vivo Molecular Data to Curate Disease Signatures of High Translational Potential.

本文引用的文献

1
The Gene Ontology Resource: 20 years and still GOing strong.《基因本体论资源:20 年,持续强大》
Nucleic Acids Res. 2019 Jan 8;47(D1):D330-D338. doi: 10.1093/nar/gky1055.
2
The Pfam protein families database in 2019.2019 年 Pfam 蛋白质家族数据库。
Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432. doi: 10.1093/nar/gky995.
3
The Terabase Search Engine: a large-scale relational database of short-read sequences.巨量序列搜索引擎:一个大规模的关系型短读序列数据库。
一种用于筛选大量体内分子数据以精心挑选具有高转化潜力疾病特征的机器学习管道。
Methods Mol Biol. 2025;2880:331-344. doi: 10.1007/978-1-0716-4276-4_17.
4
Amino acid influx via LAT1 regulates iron demand and sensitivity to PPMX-T003 of aggressive natural killer cell leukemia.氨基酸通过 LAT1 内流调节铁需求和侵袭性自然杀伤细胞白血病对 PPMX-T003 的敏感性。
Leukemia. 2024 Aug;38(8):1731-1741. doi: 10.1038/s41375-024-02296-6. Epub 2024 Jun 24.
5
Centralized and Federated Models for the Analysis of Clinical Data.集中式和联邦式临床数据分析模型。
Annu Rev Biomed Data Sci. 2024 Aug;7(1):179-199. doi: 10.1146/annurev-biodatasci-122220-115746. Epub 2024 Jul 24.
6
Validation of deep amplicon sequencing of Dicrocoelium in small ruminants from Northern regions of Pakistan.验证巴基斯坦北部地区小反刍动物双腔吸虫的深度扩增子测序。
PLoS One. 2024 Apr 29;19(4):e0302455. doi: 10.1371/journal.pone.0302455. eCollection 2024.
7
Protocol for fast clonal family inference and analysis from large-scale B cell receptor repertoire sequencing data.从大规模 B 细胞受体库测序数据中快速推断和分析克隆家族的方案。
STAR Protoc. 2024 Jun 21;5(2):102969. doi: 10.1016/j.xpro.2024.102969. Epub 2024 Mar 18.
8
LAFEM: A Scoring Model to Evaluate Functional Landscape of Lysine Acetylome.LAFEM:一种评估赖氨酸乙酰化组功能格局的评分模型。
Mol Cell Proteomics. 2024 Jan;23(1):100700. doi: 10.1016/j.mcpro.2023.100700. Epub 2023 Dec 15.
9
Modular Splicing Is Linked to Evolution in the Synapse-Specificity Molecule Kirrel3.模块化拼接与突触特异性分子 Kirrel3 的进化有关。
eNeuro. 2023 Dec 5;10(12). doi: 10.1523/ENEURO.0253-23.2023. Print 2023 Dec.
10
Sin3a associated protein 130 kDa, sap130, plays an evolutionary conserved role in zebrafish heart development.Sin3a相关蛋白130千道尔顿(Sap130)在斑马鱼心脏发育中发挥着进化保守作用。
Front Cell Dev Biol. 2023 Aug 30;11:1197109. doi: 10.3389/fcell.2023.1197109. eCollection 2023.
Bioinformatics. 2019 Feb 15;35(4):665-670. doi: 10.1093/bioinformatics/bty657.
4
KEGG: new perspectives on genomes, pathways, diseases and drugs.京都基因与基因组百科全书(KEGG):关于基因组、通路、疾病和药物的新视角。
Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361. doi: 10.1093/nar/gkw1092. Epub 2016 Nov 28.
5
Evaluation of relational and NoSQL database architectures to manage genomic annotations.用于管理基因组注释的关系型和非关系型数据库架构评估。
J Biomed Inform. 2016 Dec;64:288-295. doi: 10.1016/j.jbi.2016.10.015. Epub 2016 Oct 31.
6
GenAp: a distributed SQL interface for genomic data.GenAp:用于基因组数据的分布式SQL接口。
BMC Bioinformatics. 2016 Feb 4;17:63. doi: 10.1186/s12859-016-0904-1.
7
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.美国国立生物技术信息中心的参考序列(RefSeq)数据库:当前状态、分类扩展及功能注释。
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45. doi: 10.1093/nar/gkv1189. Epub 2015 Nov 8.
8
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.ProteinWorldDB:查询来自完整基因组的蛋白质组之间的激进两两比对。
Bioinformatics. 2010 Mar 1;26(5):705-7. doi: 10.1093/bioinformatics/btq011. Epub 2010 Jan 19.
9
Orthologs, paralogs, and evolutionary genomics.直系同源基因、旁系同源基因与进化基因组学。
Annu Rev Genet. 2005;39:309-38. doi: 10.1146/annurev.genet.39.073003.114725.
10
Conceptual data modelling for bioinformatics.生物信息学的概念数据建模
Brief Bioinform. 2002 Jun;3(2):166-80. doi: 10.1093/bib/3.2.166.