• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

OCR分类器:将统计控制图集成到机器学习框架中以更好地检测开放染色质区域。

OCRClassifier: integrating statistical control chart into machine learning framework for better detecting open chromatin regions.

作者信息

Lai Xin, Liu Min, Liu Yuqian, Zhu Xiaoyan, Wang Jiayin

机构信息

School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China.

Shaanxi Engineering Research Center of Medical and Health Big Data, Xi'an Jiaotong University, Xi'an, China.

出版信息

Front Genet. 2024 Dec 4;15:1400228. doi: 10.3389/fgene.2024.1400228. eCollection 2024.

DOI:10.3389/fgene.2024.1400228
PMID:39698466
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11652186/
Abstract

Open chromatin regions (OCRs) play a crucial role in transcriptional regulation and gene expression. In recent years, there has been a growing interest in using plasma cell-free DNA (cfDNA) sequencing data to detect OCRs. By analyzing the characteristics of cfDNA fragments and their sequencing coverage, researchers can differentiate OCRs from non-OCRs. However, the presence of noise and variability in cfDNA-seq data poses challenges for the training data used in the noise-tolerance learning-based OCR estimation approach, as it contains numerous noisy labels that may impact the accuracy of the results. For current methods of detecting OCRs, they rely on statistical features derived from typical open and closed chromatin regions to determine whether a region is OCR or non-OCR. However, there are some atypical regions that exhibit statistical features that fall between the two categories, making it difficult to classify them definitively as either open or closed chromatin regions (CCRs). These regions should be considered as partially open chromatin regions (pOCRs). In this paper, we present OCRClassifier, a novel framework that combines control charts and machine learning to address the impact of high-proportion noisy labels in the training set and classify the chromatin open states into three classes accurately. Our method comprises two control charts. We first design a robust Hotelling T control chart and create new run rules to accurately identify reliable OCRs and CCRs within the initial training set. Then, we exclusively utilize the pure training set consisting of OCRs and CCRs to create and train a sensitized T control chart. This sensitized T control chart is specifically designed to accurately differentiate between the three categories of chromatin states: open, partially open, and closed. Experimental results demonstrate that under this framework, the model exhibits not only excellent performance in terms of three-class classification, but also higher accuracy and sensitivity in binary classification compared to the state-of-the-art models currently available.

摘要

开放染色质区域(OCRs)在转录调控和基因表达中起着至关重要的作用。近年来,利用浆细胞游离DNA(cfDNA)测序数据检测OCRs的兴趣日益浓厚。通过分析cfDNA片段的特征及其测序覆盖度,研究人员可以区分OCRs和非OCRs。然而,cfDNA测序数据中噪声和变异性的存在给基于耐噪声学习的OCR估计方法中使用的训练数据带来了挑战,因为它包含大量可能影响结果准确性的噪声标签。对于当前检测OCRs的方法,它们依赖于从典型的开放和封闭染色质区域导出的统计特征来确定一个区域是OCR还是非OCR。然而,存在一些非典型区域,其统计特征介于这两类之间,使得难以将它们明确分类为开放或封闭染色质区域(CCRs)。这些区域应被视为部分开放染色质区域(pOCRs)。在本文中,我们提出了OCRClassifier,这是一个新颖的框架,它结合控制图和机器学习来解决训练集中高比例噪声标签的影响,并准确地将染色质开放状态分为三类。我们的方法包括两个控制图。我们首先设计一个稳健的霍特林T控制图并创建新的运行规则,以准确识别初始训练集中可靠的OCRs和CCRs。然后,我们专门利用由OCRs和CCRs组成的纯训练集来创建和训练一个敏感T控制图。这个敏感T控制图专门设计用于准确区分染色质状态的三类:开放、部分开放和封闭。实验结果表明,在此框架下,该模型不仅在三类分类方面表现出色,而且在二分类中与目前可用的最先进模型相比具有更高的准确性和敏感性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/b3cb7d5eeab2/fgene-15-1400228-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/abf27d8c199e/fgene-15-1400228-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/d18eb7da8fb5/fgene-15-1400228-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/8fce898d34a4/fgene-15-1400228-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/3a51b8792bfd/fgene-15-1400228-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/f9dab57fd9b2/fgene-15-1400228-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/21c17017df8f/fgene-15-1400228-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/c99163ad12ee/fgene-15-1400228-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/38b56ed68680/fgene-15-1400228-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/b3cb7d5eeab2/fgene-15-1400228-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/abf27d8c199e/fgene-15-1400228-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/d18eb7da8fb5/fgene-15-1400228-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/8fce898d34a4/fgene-15-1400228-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/3a51b8792bfd/fgene-15-1400228-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/f9dab57fd9b2/fgene-15-1400228-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/21c17017df8f/fgene-15-1400228-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/c99163ad12ee/fgene-15-1400228-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/38b56ed68680/fgene-15-1400228-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8deb/11652186/b3cb7d5eeab2/fgene-15-1400228-g009.jpg

相似文献

1
OCRClassifier: integrating statistical control chart into machine learning framework for better detecting open chromatin regions.OCR分类器:将统计控制图集成到机器学习框架中以更好地检测开放染色质区域。
Front Genet. 2024 Dec 4;15:1400228. doi: 10.3389/fgene.2024.1400228. eCollection 2024.
2
OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions.OCRFinder:一种用于精确估计开放染色质区域的抗噪声机器学习方法。
Front Genet. 2023 Jun 1;14:1184744. doi: 10.3389/fgene.2023.1184744. eCollection 2023.
3
OCRDetector: Accurately Detecting Open Chromatin Regions via Plasma Cell-Free DNA Sequencing Data.OCRDetector:通过浆细胞游离 DNA 测序数据准确检测开放染色质区域。
Int J Mol Sci. 2021 May 28;22(11):5802. doi: 10.3390/ijms22115802.
4
Genomic Features of Open Chromatin Regions (OCRs) in Wild Soybean and Their Effects on Gene Expressions.野生大豆开放染色质区域(OCRs)的基因组特征及其对基因表达的影响。
Genes (Basel). 2021 Apr 25;12(5):640. doi: 10.3390/genes12050640.
5
Synergism of open chromatin regions involved in regulating genes in Bombyx mori.家蚕中参与调控基因的开放染色质区域的协同作用。
Insect Biochem Mol Biol. 2019 Jul;110:10-18. doi: 10.1016/j.ibmb.2019.04.014. Epub 2019 Apr 18.
6
DeepOCR: A multi-species deep-learning framework for accurate identification of open chromatin regions in livestock.DeepOCR:一个用于准确识别家畜开放染色质区域的多物种深度学习框架。
Comput Biol Chem. 2024 Jun;110:108077. doi: 10.1016/j.compbiolchem.2024.108077. Epub 2024 Apr 19.
7
CharID: a two-step model for universal prediction of interactions between chromatin accessible regions.CharID:一种两步模型,用于普遍预测染色质可及区域之间的相互作用。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab602.
8
Extended T: Learning With Mixed Closed-Set and Open-Set Noisy Labels.扩展T:使用混合闭集和开集噪声标签进行学习
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3047-3058. doi: 10.1109/TPAMI.2022.3180545. Epub 2023 Feb 3.
9
Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction.结合转录因子结合亲和力与开放染色质数据以进行准确的基因表达预测。
Nucleic Acids Res. 2017 Jan 9;45(1):54-66. doi: 10.1093/nar/gkw1061. Epub 2016 Nov 29.
10
Learning Student Network Under Universal Label Noise.通用标签噪声下的学习型学生网络
IEEE Trans Image Process. 2024;33:4363-4376. doi: 10.1109/TIP.2024.3430539. Epub 2024 Aug 2.

本文引用的文献

1
OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions.OCRFinder:一种用于精确估计开放染色质区域的抗噪声机器学习方法。
Front Genet. 2023 Jun 1;14:1184744. doi: 10.3389/fgene.2023.1184744. eCollection 2023.
2
Plasma cell-free DNA promise monitoring and tissue injury assessment of COVID-19.血浆游离 DNA 有望监测和评估 COVID-19 的组织损伤。
Mol Genet Genomics. 2023 Jul;298(4):823-836. doi: 10.1007/s00438-023-02014-4. Epub 2023 Apr 14.
3
DNA methylation analysis explores the molecular basis of plasma cell-free DNA fragmentation.
DNA 甲基化分析探索了血浆游离 DNA 片段化的分子基础。
Nat Commun. 2023 Jan 18;14(1):287. doi: 10.1038/s41467-023-35959-6.
4
OCRDetector: Accurately Detecting Open Chromatin Regions via Plasma Cell-Free DNA Sequencing Data.OCRDetector:通过浆细胞游离 DNA 测序数据准确检测开放染色质区域。
Int J Mol Sci. 2021 May 28;22(11):5802. doi: 10.3390/ijms22115802.
5
Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies.液体活检中游离 DNA 的表观遗传学、片段组学和拓扑结构。
Science. 2021 Apr 9;372(6538). doi: 10.1126/science.aaw3616.
6
Toward the Early Detection of Cancer by Decoding the Epigenetic and Environmental Fingerprints of Cell-Free DNA.通过解码游离 DNA 的表观遗传和环境指纹,实现癌症的早期检测。
Cancer Cell. 2019 Oct 14;36(4):350-368. doi: 10.1016/j.ccell.2019.09.003.
7
Orientation-aware plasma cell-free DNA fragmentation analysis in open chromatin regions informs tissue of origin.面向取向的浆细胞游离 DNA 片段分析在开放染色质区域提供组织来源信息。
Genome Res. 2019 Mar;29(3):418-427. doi: 10.1101/gr.242719.118.
8
Chromatin accessibility and the regulatory epigenome.染色质可及性和调控表观基因组。
Nat Rev Genet. 2019 Apr;20(4):207-220. doi: 10.1038/s41576-018-0089-8.
9
Epigenetic Biomarkers in Cell-Free DNA and Applications in Liquid Biopsy.无细胞游离 DNA 中的表观遗传生物标志物及其在液体活检中的应用。
Genes (Basel). 2019 Jan 9;10(1):32. doi: 10.3390/genes10010032.
10
Circular RNA and its mechanisms in disease: From the bench to the clinic.环状 RNA 及其在疾病中的机制:从实验室到临床。
Pharmacol Ther. 2018 Jul;187:31-44. doi: 10.1016/j.pharmthera.2018.01.010. Epub 2018 Feb 14.