• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

构建和评估共识基因组区间集的方法。

Methods for constructing and evaluating consensus genomic interval sets.

机构信息

Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA.

Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA.

出版信息

Nucleic Acids Res. 2024 Sep 23;52(17):10119-10131. doi: 10.1093/nar/gkae685.

DOI:10.1093/nar/gkae685
PMID:39180401
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11417377/
Abstract

The amount of genomic region data continues to increase. Integrating across diverse genomic region sets requires consensus regions, which enable comparing regions across experiments, but also by necessity lose precision in region definitions. We require methods to assess this loss of precision and build optimal consensus region sets. Here, we introduce the concept of flexible intervals and propose three novel methods for building consensus region sets, or universes: a coverage cutoff method, a likelihood method, and a Hidden Markov Model. We then propose three novel measures for evaluating how well a proposed universe fits a collection of region sets: a base-level overlap score, a region boundary distance score, and a likelihood score. We apply our methods and evaluation approaches to several collections of region sets and show how these methods can be used to evaluate fit of universes and build optimal universes. We describe scenarios where the common approach of merging regions to create consensus leads to undesirable outcomes and provide principled alternatives that provide interoperability of interval data while minimizing loss of resolution.

摘要

基因组区域数据的数量持续增加。整合不同的基因组区域集需要共识区域,这使得能够跨实验比较区域,但也不可避免地降低区域定义的精度。我们需要方法来评估这种精度损失,并构建最优的共识区域集。在这里,我们引入了灵活区间的概念,并提出了三种构建共识区域集或宇宙的新方法:覆盖截止方法、似然方法和隐马尔可夫模型。然后,我们提出了三种新的度量标准,用于评估提议的宇宙与一组区域集的拟合程度:基本重叠得分、区域边界距离得分和似然得分。我们将这些方法和评估方法应用于几集区域集,并展示了如何使用这些方法来评估宇宙的拟合程度和构建最优的宇宙。我们描述了常见的合并区域以创建共识的方法导致不理想结果的情况,并提供了原则性的替代方法,这些方法在最小化分辨率损失的同时提供了区间数据的互操作性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/5594e9f61962/gkae685fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/a5a560f6a359/gkae685figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/773b45028bd8/gkae685fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/d4072008a975/gkae685fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/98c0404c089f/gkae685fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/9e5a9499f35e/gkae685fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/7db356c54f34/gkae685fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/9af1da043f2a/gkae685fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/06a0f95cd7c7/gkae685fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/76cb2c43f605/gkae685fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/5594e9f61962/gkae685fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/a5a560f6a359/gkae685figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/773b45028bd8/gkae685fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/d4072008a975/gkae685fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/98c0404c089f/gkae685fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/9e5a9499f35e/gkae685fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/7db356c54f34/gkae685fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/9af1da043f2a/gkae685fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/06a0f95cd7c7/gkae685fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/76cb2c43f605/gkae685fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d602/11417377/5594e9f61962/gkae685fig9.jpg

相似文献

1
Methods for constructing and evaluating consensus genomic interval sets.构建和评估共识基因组区间集的方法。
Nucleic Acids Res. 2024 Sep 23;52(17):10119-10131. doi: 10.1093/nar/gkae685.
2
Ancestral population genomics using coalescence hidden Markov models and heuristic optimisation algorithms.使用合并隐马尔可夫模型和启发式优化算法的祖先群体基因组学。
Comput Biol Chem. 2015 Aug;57:80-92. doi: 10.1016/j.compbiolchem.2015.02.001. Epub 2015 Mar 5.
3
Hidden Markov Models in Population Genomics.群体基因组学中的隐马尔可夫模型
Methods Mol Biol. 2017;1552:149-164. doi: 10.1007/978-1-4939-6753-7_11.
4
Joint Representation Learning for Retrieval and Annotation of Genomic Interval Sets.用于基因组区间集检索和注释的联合表示学习
Bioengineering (Basel). 2024 Mar 8;11(3):263. doi: 10.3390/bioengineering11030263.
5
WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment.WITCH:通过加权一致隐马尔可夫模型比对改进多序列比对
J Comput Biol. 2022 Aug;29(8):782-801. doi: 10.1089/cmb.2021.0585. Epub 2022 May 17.
6
AAR-RT - A system for auto-contouring organs at risk on CT images for radiation therapy planning: Principles, design, and large-scale evaluation on head-and-neck and thoracic cancer cases.AAR-RT - 一种用于放射治疗计划的CT图像上危及器官自动轮廓勾画的系统:原理、设计及对头颈部和胸段癌症病例的大规模评估。
Med Image Anal. 2019 May;54:45-62. doi: 10.1016/j.media.2019.01.008. Epub 2019 Jan 29.
7
Computational analysis of whole-genome differential allelic expression data in human.人类全基因组差异等位基因表达数据的计算分析。
PLoS Comput Biol. 2010 Jul 8;6(7):e1000849. doi: 10.1371/journal.pcbi.1000849.
8
Maximum discrimination hidden Markov models of sequence consensus.序列一致性的最大判别隐马尔可夫模型
J Comput Biol. 1995 Spring;2(1):9-23. doi: 10.1089/cmb.1995.2.9.
9
A gentle tutorial on accelerated parameter and confidence interval estimation for hidden Markov models using Template Model Builder.使用模板模型构建器对隐马尔可夫模型进行加速参数和置信区间估计的简单教程。
Biom J. 2022 Oct;64(7):1260-1288. doi: 10.1002/bimj.202100256. Epub 2022 May 27.
10
Comparative map and trait viewer (CMTV): an integrated bioinformatic tool to construct consensus maps and compare QTL and functional genomics data across genomes and experiments.比较图谱与性状浏览器(CMTV):一种用于构建整合图谱并跨基因组和实验比较QTL及功能基因组学数据的综合生物信息学工具。
Plant Mol Biol. 2004 Oct;56(3):465-80. doi: 10.1007/s11103-004-4950-0.

引用本文的文献

1
High level of aneuploidy and recurrent loss of chromosome 11 as relevant features of somatotroph pituitary tumors.高倍体和 11 号染色体的反复缺失是生长激素细胞垂体肿瘤的重要特征。
J Transl Med. 2024 Nov 4;22(1):994. doi: 10.1186/s12967-024-05736-0.
2
Joint Representation Learning for Retrieval and Annotation of Genomic Interval Sets.用于基因组区间集检索和注释的联合表示学习
Bioengineering (Basel). 2024 Mar 8;11(3):263. doi: 10.3390/bioengineering11030263.

本文引用的文献

1
Methods for evaluating unsupervised vector representations of genomic regions.评估基因组区域无监督向量表示的方法。
NAR Genom Bioinform. 2024 Aug 10;6(3):lqae086. doi: 10.1093/nargab/lqae086. eCollection 2024 Sep.
2
Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings.使用预训练嵌入对单细胞ATAC数据进行快速聚类和细胞类型注释。
NAR Genom Bioinform. 2024 Jul 5;6(3):lqae073. doi: 10.1093/nargab/lqae073. eCollection 2024 Sep.
3
Joint Representation Learning for Retrieval and Annotation of Genomic Interval Sets.
用于基因组区间集检索和注释的联合表示学习
Bioengineering (Basel). 2024 Mar 8;11(3):263. doi: 10.3390/bioengineering11030263.
4
Opportunities and challenges in sharing and reusing genomic interval data.共享和再利用基因组区间数据中的机遇与挑战。
Front Genet. 2023 Mar 20;14:1155809. doi: 10.3389/fgene.2023.1155809. eCollection 2023.
5
Quality-controlled R-loop meta-analysis reveals the characteristics of R-loop consensus regions.质量控制的 R 环荟萃分析揭示了 R 环共识区域的特征。
Nucleic Acids Res. 2022 Jul 22;50(13):7260-7286. doi: 10.1093/nar/gkac537.
6
ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data.ChIP-Atlas 2021 更新:通过全面整合 ChIP-seq、ATAC-seq 和 Bisulfite-seq 数据,用于探索表观基因组景观的数据挖掘套件。
Nucleic Acids Res. 2022 Jul 5;50(W1):W175-W182. doi: 10.1093/nar/gkac199.
7
PEPATAC: an optimized pipeline for ATAC-seq data analysis with serial alignments.PEPATAC:一种用于通过序列比对进行ATAC-seq数据分析的优化流程。
NAR Genom Bioinform. 2021 Nov 23;3(4):lqab101. doi: 10.1093/nargab/lqab101. eCollection 2021 Dec.
8
Embeddings of genomic region sets capture rich biological associations in lower dimensions.基因组区域集的嵌入在低维空间中捕获丰富的生物学关联。
Bioinformatics. 2021 Dec 7;37(23):4299-4306. doi: 10.1093/bioinformatics/btab439.
9
ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis.ArchR 是一个可扩展的软件包,用于整合单细胞染色质可及性分析。
Nat Genet. 2021 Mar;53(3):403-411. doi: 10.1038/s41588-021-00790-6. Epub 2021 Feb 25.
10
Epigenomic Reprogramming toward Mesenchymal-Epithelial Transition in Ovarian-Cancer-Associated Mesenchymal Stem Cells Drives Metastasis.卵巢癌相关间充质干细胞向间质-上皮转化的表观遗传重编程驱动转移。
Cell Rep. 2020 Dec 8;33(10):108473. doi: 10.1016/j.celrep.2020.108473.