• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

OCRFinder:一种用于精确估计开放染色质区域的抗噪声机器学习方法。

OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions.

作者信息

Ren Jiayi, Liu Yuqian, Zhu Xiaoyan, Wang Xuwen, Li Yifei, Liu Yuxin, Hu Wenqing, Zhang Xuanping, Wang Jiayin

机构信息

School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China.

Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, China.

出版信息

Front Genet. 2023 Jun 1;14:1184744. doi: 10.3389/fgene.2023.1184744. eCollection 2023.

DOI:10.3389/fgene.2023.1184744
PMID:37323658
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10267440/
Abstract

Open chromatin regions are the genomic regions associated with basic cellular physiological activities, while chromatin accessibility is reported to affect gene expressions and functions. A basic computational problem is to efficiently estimate open chromatin regions, which could facilitate both genomic and epigenetic studies. Currently, ATAC-seq and cfDNA-seq (plasma cell-free DNA sequencing) are two popular strategies to detect OCRs. As cfDNA-seq can obtain more biomarkers in one round of sequencing, it is considered more effective and convenient. However, in processing cfDNA-seq data, due to the dynamically variable chromatin accessibility, it is quite difficult to obtain the training data with pure OCRs or non-OCRs, and leads to a noise problem for either feature-based approaches or learning-based approaches. In this paper, we propose a learning-based OCR estimation approach with a noise-tolerance design. The proposed approach, named OCRFinder, incorporates the ideas of ensemble learning framework and semi-supervised strategy to avoid potential overfitting of noisy labels, which are the false positives on OCRs and non-OCRs. Compared to different noise control strategies and state-of-the-art approaches, OCRFinder achieved higher accuracies and sensitivities in the experiments. In addition, OCRFinder also has an excellent performance in ATAC-seq or DNase-seq comparison experiments.

摘要

开放染色质区域是与基本细胞生理活动相关的基因组区域,而据报道染色质可及性会影响基因表达和功能。一个基本的计算问题是有效地估计开放染色质区域,这有助于基因组和表观遗传学研究。目前,ATAC-seq和cfDNA-seq(血浆游离DNA测序)是检测开放染色质区域(OCR)的两种常用策略。由于cfDNA-seq可以在一轮测序中获得更多生物标志物,因此它被认为更有效、更方便。然而,在处理cfDNA-seq数据时,由于染色质可及性动态变化,很难获得纯开放染色质区域或非开放染色质区域的训练数据,这给基于特征的方法或基于学习的方法带来了噪声问题。在本文中,我们提出了一种具有噪声容忍设计的基于学习的开放染色质区域估计方法。所提出的方法名为OCRFinder,它融合了集成学习框架和半监督策略的思想,以避免噪声标签(即开放染色质区域和非开放染色质区域上的假阳性)的潜在过拟合。与不同的噪声控制策略和现有技术方法相比,OCRFinder在实验中实现了更高的准确率和灵敏度。此外,OCRFinder在ATAC-seq或DNase-seq比较实验中也具有出色的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/187bba010455/fgene-14-1184744-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/5a5d9825be63/fgene-14-1184744-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/258417ce3416/fgene-14-1184744-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/5fd1af17c09c/fgene-14-1184744-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/b1aca50c1160/fgene-14-1184744-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/722625c9bea1/fgene-14-1184744-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/2a2414ad7c50/fgene-14-1184744-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/efc5a229b0b9/fgene-14-1184744-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/187bba010455/fgene-14-1184744-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/5a5d9825be63/fgene-14-1184744-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/258417ce3416/fgene-14-1184744-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/5fd1af17c09c/fgene-14-1184744-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/b1aca50c1160/fgene-14-1184744-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/722625c9bea1/fgene-14-1184744-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/2a2414ad7c50/fgene-14-1184744-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/efc5a229b0b9/fgene-14-1184744-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5b1/10267440/187bba010455/fgene-14-1184744-g008.jpg

相似文献

1
OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions.OCRFinder:一种用于精确估计开放染色质区域的抗噪声机器学习方法。
Front Genet. 2023 Jun 1;14:1184744. doi: 10.3389/fgene.2023.1184744. eCollection 2023.
2
OCRDetector: Accurately Detecting Open Chromatin Regions via Plasma Cell-Free DNA Sequencing Data.OCRDetector:通过浆细胞游离 DNA 测序数据准确检测开放染色质区域。
Int J Mol Sci. 2021 May 28;22(11):5802. doi: 10.3390/ijms22115802.
3
Genomic Features of Open Chromatin Regions (OCRs) in Wild Soybean and Their Effects on Gene Expressions.野生大豆开放染色质区域(OCRs)的基因组特征及其对基因表达的影响。
Genes (Basel). 2021 Apr 25;12(5):640. doi: 10.3390/genes12050640.
4
Efficient chromatin accessibility mapping in situ by nucleosome-tethered tagmentation.通过核小体连接的标签酶切技术进行高效的染色质可及性原位作图。
Elife. 2020 Nov 16;9:e63274. doi: 10.7554/eLife.63274.
5
CharPlant: A De Novo Open Chromatin Region Prediction Tool for Plant Genomes.CharPlant:一种用于植物基因组的从头开放染色质区域预测工具。
Genomics Proteomics Bioinformatics. 2021 Oct;19(5):860-871. doi: 10.1016/j.gpb.2020.06.021. Epub 2021 Mar 2.
6
Genome-wide MNase hypersensitivity assay unveils distinct classes of open chromatin associated with H3K27me3 and DNA methylation in Arabidopsis thaliana.全基因组微球菌核酸酶超敏反应分析揭示了拟南芥中与H3K27me3和DNA甲基化相关的不同类型的开放染色质。
Genome Biol. 2020 Feb 3;21(1):24. doi: 10.1186/s13059-020-1927-5.
7
ATACgraph: Profiling Genome-Wide Chromatin Accessibility From ATAC-seq.ATACgraph:通过ATAC-seq对全基因组染色质可及性进行分析
Front Genet. 2021 Jan 13;11:618478. doi: 10.3389/fgene.2020.618478. eCollection 2020.
8
Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation.单细胞ATAC测序分析:从数据预处理到假设生成
Comput Struct Biotechnol J. 2020 Jun 12;18:1429-1439. doi: 10.1016/j.csbj.2020.06.012. eCollection 2020.
9
CloudATAC: a cloud-based framework for ATAC-Seq data analysis.CloudATAC:一个基于云的 ATAC-Seq 数据分析框架。
Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae090.
10
Unsupervised contrastive peak caller for ATAC-seq.无监督对比峰 caller 用于 ATAC-seq。
Genome Res. 2023 Jul;33(7):1133-1144. doi: 10.1101/gr.277677.123. Epub 2023 May 22.

引用本文的文献

1
OCRClassifier: integrating statistical control chart into machine learning framework for better detecting open chromatin regions.OCR分类器:将统计控制图集成到机器学习框架中以更好地检测开放染色质区域。
Front Genet. 2024 Dec 4;15:1400228. doi: 10.3389/fgene.2024.1400228. eCollection 2024.
2
Deep learning with noisy labels in medical prediction problems: a scoping review.深度学习中带噪标签在医学预测问题中的应用:范围综述。
J Am Med Inform Assoc. 2024 Jun 20;31(7):1596-1607. doi: 10.1093/jamia/ocae108.
3
Liquid biopsy in T-cell lymphoma: biomarker detection techniques and clinical application.

本文引用的文献

1
Chromatin accessibility profiling methods.染色质可及性分析方法。
Nat Rev Methods Primers. 2021;1. doi: 10.1038/s43586-020-00008-9. Epub 2021 Jan 21.
2
TMBserval: a statistical explainable learning model reveals weighted tumor mutation burden better categorizing therapeutic benefits.TMBserval:一种统计可解释学习模型,更好地分类治疗获益的加权肿瘤突变负担。
Front Immunol. 2023 May 10;14:1151755. doi: 10.3389/fimmu.2023.1151755. eCollection 2023.
3
A Joint Model Considering Measurement Errors for Optimally Identifying Tumor Mutation Burden Threshold.
液体活检在 T 细胞淋巴瘤中的应用:生物标志物检测技术及临床应用。
Mol Cancer. 2024 Feb 17;23(1):36. doi: 10.1186/s12943-024-01947-7.
一种考虑测量误差以优化识别肿瘤突变负担阈值的联合模型。
Front Genet. 2022 Aug 4;13:915839. doi: 10.3389/fgene.2022.915839. eCollection 2022.
4
OCRDetector: Accurately Detecting Open Chromatin Regions via Plasma Cell-Free DNA Sequencing Data.OCRDetector:通过浆细胞游离 DNA 测序数据准确检测开放染色质区域。
Int J Mol Sci. 2021 May 28;22(11):5802. doi: 10.3390/ijms22115802.
5
Chromatin accessibility of circulating CD8 T cells predicts treatment response to PD-1 blockade in patients with gastric cancer.循环 CD8 T 细胞染色质可及性可预测 PD-1 阻断治疗胃癌患者的反应。
Nat Commun. 2021 Feb 12;12(1):975. doi: 10.1038/s41467-021-21299-w.
6
Inference of transcription factor binding from cell-free DNA enables tumor subtype prediction and early detection.从游离 DNA 推断转录因子结合可实现肿瘤亚型预测和早期检测。
Nat Commun. 2019 Oct 11;10(1):4666. doi: 10.1038/s41467-019-12714-4.
7
The Open Chromatin Landscape of Non-Small Cell Lung Carcinoma.非小细胞肺癌的开放染色质景观。
Cancer Res. 2019 Oct 1;79(19):4840-4854. doi: 10.1158/0008-5472.CAN-18-3663. Epub 2019 Jun 17.
8
Orientation-aware plasma cell-free DNA fragmentation analysis in open chromatin regions informs tissue of origin.面向取向的浆细胞游离 DNA 片段分析在开放染色质区域提供组织来源信息。
Genome Res. 2019 Mar;29(3):418-427. doi: 10.1101/gr.242719.118.
9
Chromatin accessibility and the regulatory epigenome.染色质可及性和调控表观基因组。
Nat Rev Genet. 2019 Apr;20(4):207-220. doi: 10.1038/s41576-018-0089-8.
10
Preferred end coordinates and somatic variants as signatures of circulating tumor DNA associated with hepatocellular carcinoma.首选末端坐标和体细胞变异作为与肝细胞癌相关的循环肿瘤 DNA 的特征。
Proc Natl Acad Sci U S A. 2018 Nov 13;115(46):E10925-E10933. doi: 10.1073/pnas.1814616115. Epub 2018 Oct 29.