• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于增强型范围树的高效基因组区间查询。

Efficient Genomic Interval Queries Using Augmented Range Trees.

机构信息

Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

出版信息

Sci Rep. 2019 Mar 25;9(1):5059. doi: 10.1038/s41598-019-41451-3.

DOI:10.1038/s41598-019-41451-3
PMID:30911095
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6434014/
Abstract

Efficient large-scale annotation of genomic intervals is essential for personal genome interpretation in the realm of precision medicine. There are 13 possible relations between two intervals according to Allen's interval algebra. Conventional interval trees are routinely used to identify the genomic intervals satisfying a coarse relation with a query interval, but cannot support efficient query for more refined relations such as all Allen's relations. We design and implement a novel approach to address this unmet need. Through rewriting Allen's interval relations, we transform an interval query to a range query, then adapt and utilize the range trees for querying. We implement two types of range trees: a basic 2-dimensional range tree (2D-RT) and an augmented range tree with fractional cascading (RTFC) and compare them with the conventional interval tree (IT). Theoretical analysis shows that RTFC can achieve the best time complexity for interval queries regarding all Allen's relations among the three trees. We also perform comparative experiments on the efficiency of RTFC, 2D-RT and IT in querying noncoding element annotations in a large collection of personal genomes. Our experimental results show that 2D-RT is more efficient than IT for interval queries regarding most of Allen's relations, RTFC is even more efficient than 2D-RT. The results demonstrate that RTFC is an efficient data structure for querying large-scale datasets regarding Allen's relations between genomic intervals, such as those required by interpreting genome-wide variation in large populations.

摘要

高效的大规模基因组区间标注对于精准医学领域的个人基因组解释至关重要。根据艾伦区间代数,两个区间之间有 13 种可能的关系。传统的区间树通常用于识别与查询区间具有粗粒度关系的基因组区间,但无法支持对更精细关系(如所有艾伦关系)的高效查询。我们设计并实现了一种新颖的方法来满足这一未满足的需求。通过重写艾伦区间关系,我们将区间查询转换为范围查询,然后适当地利用范围树进行查询。我们实现了两种类型的范围树:基本的二维范围树(2D-RT)和具有分数级联的增强范围树(RTFC),并将它们与传统的区间树(IT)进行比较。理论分析表明,在三种树中,RTFC 可以实现针对所有艾伦关系的区间查询的最佳时间复杂度。我们还在大规模个人基因组中大量非编码元件注释的查询效率方面对 RTFC、2D-RT 和 IT 进行了比较实验。实验结果表明,在针对大多数艾伦关系的区间查询方面,2D-RT 比 IT 更有效,而 RTFC 甚至比 2D-RT 更有效。结果表明,RTFC 是一种高效的数据结构,可用于查询大规模数据集的艾伦关系,例如在大规模人群中解释全基因组变异所需的关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/4a1c7d07b41b/41598_2019_41451_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/6e6cd0662a91/41598_2019_41451_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/7912b4013704/41598_2019_41451_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/aa7b8fd2938a/41598_2019_41451_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/a1f5672d93cf/41598_2019_41451_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/4a1c7d07b41b/41598_2019_41451_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/6e6cd0662a91/41598_2019_41451_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/7912b4013704/41598_2019_41451_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/aa7b8fd2938a/41598_2019_41451_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/a1f5672d93cf/41598_2019_41451_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2252/6434014/4a1c7d07b41b/41598_2019_41451_Fig5_HTML.jpg

相似文献

1
Efficient Genomic Interval Queries Using Augmented Range Trees.基于增强型范围树的高效基因组区间查询。
Sci Rep. 2019 Mar 25;9(1):5059. doi: 10.1038/s41598-019-41451-3.
2
Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.电子病历中自然语言处理的远距离标注高效查询
Biomed Inform Insights. 2016 Jul 19;8:29-38. doi: 10.4137/BII.S38916. eCollection 2016.
3
A space-efficient construction of the Burrows-Wheeler transform for genomic data.一种用于基因组数据的布罗-惠勒变换的节省空间的构建方法。
J Comput Biol. 2005 Sep;12(7):943-51. doi: 10.1089/cmb.2005.12.943.
4
Transradial Approach for Cardiac Catheterization in Patients With Negative Allen's Test.对艾伦试验阴性患者行桡动脉途径心脏导管插入术
J Invasive Cardiol. 2015 Sep;27(9):416-20. Epub 2015 Jun 15.
5
Augmented Interval List: a novel data structure for efficient genomic interval search.增强型区间列表:一种用于高效基因组区间搜索的新型数据结构。
Bioinformatics. 2019 Dec 1;35(23):4907-4911. doi: 10.1093/bioinformatics/btz407.
6
A method for the graphical modeling of relative temporal constraints.一种相对时间约束的图形建模方法。
J Biomed Inform. 2019 Dec;100:103314. doi: 10.1016/j.jbi.2019.103314. Epub 2019 Oct 17.
7
Geographical variation in bill size across bird species provides evidence for Allen's rule.鸟类物种喙部大小的地理变异为艾伦法则提供了证据。
Am Nat. 2010 Aug;176(2):188-97. doi: 10.1086/653666.
8
Reliability of Allen's test in selection of patients for radial artery harvest.艾伦试验在选择桡动脉采集患者中的可靠性。
Ann Thorac Surg. 2000 Oct;70(4):1362-5. doi: 10.1016/s0003-4975(00)01551-4.
9
Evaluation of the palmar circulation by pulse oximetry.通过脉搏血氧饱和度仪评估手掌循环。
J Clin Monit. 1989 Jan;5(1):1-3. doi: 10.1007/BF01618362.
10
DiagHunter and GenoPix2D: programs for genomic comparisons, large-scale homology discovery and visualization.DiagHunter和GenoPix2D:用于基因组比较、大规模同源性发现及可视化的程序。
Genome Biol. 2003;4(10):R68. doi: 10.1186/gb-2003-4-10-r68. Epub 2003 Sep 19.

引用本文的文献

1
Gonomics: uniting high performance and readability for genomics with Go.基诺米克斯:用 Go 为基因组学实现高性能和易读性的统一。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad516.
2
Temporal Cohort Logic.时间队列逻辑。
AMIA Annu Symp Proc. 2023 Apr 29;2022:1237-1246. eCollection 2022.

本文引用的文献

1
Autism genetics: opportunities and challenges for clinical translation.自闭症遗传学:临床转化的机遇与挑战。
Nat Rev Genet. 2017 Jun;18(6):362-376. doi: 10.1038/nrg.2017.4. Epub 2017 Mar 6.
2
Analysis of protein-coding genetic variation in 60,706 humans.对60706名人类的蛋白质编码基因变异进行分析。
Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.
3
Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.电子病历中自然语言处理的远距离标注高效查询
Biomed Inform Insights. 2016 Jul 19;8:29-38. doi: 10.4137/BII.S38916. eCollection 2016.
4
Role of non-coding sequence variants in cancer.非编码序列变异在癌症中的作用。
Nat Rev Genet. 2016 Feb;17(2):93-108. doi: 10.1038/nrg.2015.17. Epub 2016 Jan 19.
5
An Efficient Search Algorithm for Finding Genomic-Range Overlaps Based on the Maximum Range Length.一种基于最大范围长度寻找基因组范围重叠的高效搜索算法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):778-84. doi: 10.1109/TCBB.2014.2369042.
6
Software for computing and annotating genomic ranges.基因组范围计算和注释软件。
PLoS Comput Biol. 2013;9(8):e1003118. doi: 10.1371/journal.pcbi.1003118. Epub 2013 Aug 8.
7
Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.使用嵌套包含列表从关系型数据库系统中快速存储和检索基因组区间。
Database (Oxford). 2013 Jul 26;2013:bat056. doi: 10.1093/database/bat056. Print 2013.
8
Binary Interval Search: a scalable algorithm for counting interval intersections.二进制区间搜索:一种用于计算区间交集的可扩展算法。
Bioinformatics. 2013 Jan 1;29(1):1-7. doi: 10.1093/bioinformatics/bts652. Epub 2012 Nov 4.
9
An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。
Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.
10
BEDOPS: high-performance genomic feature operations.BEDOPS:高性能基因组特征操作。
Bioinformatics. 2012 Jul 15;28(14):1919-20. doi: 10.1093/bioinformatics/bts277. Epub 2012 May 9.