• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于预测基因组非编码变异功能效应的半监督深度学习方法。

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations.

机构信息

Department of Computer Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan.

Human Genome Center, the Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan.

出版信息

BMC Bioinformatics. 2021 Jun 2;22(Suppl 6):128. doi: 10.1186/s12859-021-03999-8.

DOI:10.1186/s12859-021-03999-8
PMID:34078253
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8171027/
Abstract

BACKGROUND

Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data.

RESULTS

We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation.

CONCLUSIONS

The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases.

摘要

背景

理解非编码变异的功能效应很重要,因为它们通常与基因表达改变和疾病发展有关。在过去的几年中,已经开发出许多计算工具来预测它们的功能影响。然而,处理数据稀缺性的固有困难导致需要进一步改进算法。在这项工作中,我们提出了一种新的方法,采用具有伪标签的半监督深度学习模型,利用来自实验注释和未注释数据的学习。

结果

我们在 GM12878、HepG2 和 K562 细胞系中准备了具有组蛋白标记、DNA 可及性和序列上下文的已知功能非编码变体。将我们的方法应用于数据集表明,与现有工具相比,它具有出色的性能。我们的结果还表明,具有伪标签的半监督模型比没有伪标签的监督模型具有更高的预测性能。有趣的是,在特定细胞系中训练的模型不太可能在其他细胞系中成功,这意味着非编码变体具有细胞类型特异性。值得注意的是,我们发现 DNA 可及性对变体的功能后果有显著贡献,这表明在建立非编码变体与基因调控相互作用之前,开放染色质构象的重要性。

结论

结合伪标签的半监督深度学习模型在处理有限数据集方面具有优势,这在生物学中并不罕见。我们的研究为寻找与各种生物学现象(包括人类疾病)相关的非编码突变提供了一种有效方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc0/8171027/07f2c86c5d52/12859_2021_3999_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc0/8171027/797601beae61/12859_2021_3999_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc0/8171027/c2f89e7518dc/12859_2021_3999_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc0/8171027/6871d52de69b/12859_2021_3999_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc0/8171027/07f2c86c5d52/12859_2021_3999_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc0/8171027/797601beae61/12859_2021_3999_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc0/8171027/c2f89e7518dc/12859_2021_3999_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc0/8171027/6871d52de69b/12859_2021_3999_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc0/8171027/07f2c86c5d52/12859_2021_3999_Fig4_HTML.jpg

相似文献

1
A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations.一种用于预测基因组非编码变异功能效应的半监督深度学习方法。
BMC Bioinformatics. 2021 Jun 2;22(Suppl 6):128. doi: 10.1186/s12859-021-03999-8.
2
CPSS: Fusing consistency regularization and pseudo-labeling techniques for semi-supervised deep cardiovascular disease detection using all unlabeled electrocardiograms.CPSS:利用所有未标记的心电图进行半监督深度心血管疾病检测的一致性正则化和伪标记技术融合。
Comput Methods Programs Biomed. 2024 Sep;254:108315. doi: 10.1016/j.cmpb.2024.108315. Epub 2024 Jul 4.
3
FaxMatch: Multi-Curriculum Pseudo-Labeling for semi-supervised medical image classification.FaxMatch:用于半监督医学图像分类的多课程伪标签
Med Phys. 2023 May;50(5):3210-3222. doi: 10.1002/mp.16312. Epub 2023 Feb 21.
4
A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica.半监督学习在哥斯达黎加当地诊所的乳房 X 光分类中的实际应用案例。
Med Biol Eng Comput. 2022 Apr;60(4):1159-1175. doi: 10.1007/s11517-021-02497-6. Epub 2022 Mar 3.
5
Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data.半监督生成对抗网络预测带有未标记数据的致癌变异的模型性能和可解释性。
BMC Bioinformatics. 2023 Feb 9;24(1):43. doi: 10.1186/s12859-023-05141-2.
6
Semi-supervised learning improves regulatory sequence prediction with unlabeled sequences.半监督学习利用未标记序列提高调控序列预测。
BMC Bioinformatics. 2023 May 5;24(1):186. doi: 10.1186/s12859-023-05303-2.
7
Semi-HIC: A novel semi-supervised deep learning method for histopathological image classification.半监督高内涵细胞成像分析:一种用于组织病理学图像分类的新型半监督深度学习方法。
Comput Biol Med. 2021 Oct;137:104788. doi: 10.1016/j.compbiomed.2021.104788. Epub 2021 Aug 21.
8
Boosting semi-supervised learning with Contrastive Complementary Labeling.基于对比互补标注的半监督学习提升方法。
Neural Netw. 2024 Feb;170:417-426. doi: 10.1016/j.neunet.2023.11.052. Epub 2023 Nov 27.
9
Pseudo-Labeling Optimization Based Ensemble Semi-Supervised Soft Sensor in the Process Industry.基于伪标签优化的过程工业集成半监督软测量
Sensors (Basel). 2021 Dec 19;21(24):8471. doi: 10.3390/s21248471.
10
Comprehensive study of semi-supervised learning for DNA methylation-based supervised classification of central nervous system tumors.基于 DNA 甲基化的中枢神经系统肿瘤有监督分类的半监督学习综合研究。
BMC Bioinformatics. 2022 Jun 8;23(1):223. doi: 10.1186/s12859-022-04764-1.

引用本文的文献

1
Scalable approaches for functional analyses of whole-genome sequencing non-coding variants.可扩展的全基因组测序非编码变异功能分析方法。
Hum Mol Genet. 2022 Oct 20;31(R1):R62-R72. doi: 10.1093/hmg/ddac191.

本文引用的文献

1
Unknown to Known: Advancing Knowledge of Coral Gene Function.未知的已知:推进珊瑚基因功能的认识。
Trends Genet. 2020 Feb;36(2):93-104. doi: 10.1016/j.tig.2019.11.001. Epub 2019 Dec 24.
2
Origin and evolution of qingke barley in Tibet.西藏青稞的起源与演化。
Nat Commun. 2018 Dec 21;9(1):5433. doi: 10.1038/s41467-018-07920-5.
3
FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications.FUN-LDA:一种用于预测非编码变异组织特异性功能效应的潜在狄利克雷分配模型:方法与应用。
Am J Hum Genet. 2018 May 3;102(5):920-942. doi: 10.1016/j.ajhg.2018.03.026.
4
A Novel Tiller Angle Gene, TAC3, together with TAC1 and D2 Largely Determine the Natural Variation of Tiller Angle in Rice Cultivars.一个新的分蘖角基因TAC3,与TAC1和D2一起在很大程度上决定了水稻品种分蘖角的自然变异。
PLoS Genet. 2016 Nov 4;12(11):e1006412. doi: 10.1371/journal.pgen.1006412. eCollection 2016 Nov.
5
Role of non-coding sequence variants in cancer.非编码序列变异在癌症中的作用。
Nat Rev Genet. 2016 Feb;17(2):93-108. doi: 10.1038/nrg.2015.17. Epub 2016 Jan 19.
6
NeuroGPS-Tree: automatic reconstruction of large-scale neuronal populations with dense neurites.神经 GPS 树:具有密集神经突的大规模神经元群体的自动重建。
Nat Methods. 2016 Jan;13(1):51-4. doi: 10.1038/nmeth.3662. Epub 2015 Nov 23.
7
DANN: a deep learning approach for annotating the pathogenicity of genetic variants.DANN:一种用于注释基因变异致病性的深度学习方法。
Bioinformatics. 2015 Mar 1;31(5):761-3. doi: 10.1093/bioinformatics/btu703. Epub 2014 Oct 22.
8
A general framework for estimating the relative pathogenicity of human genetic variants.一种用于估计人类遗传变异相对致病性的通用框架。
Nat Genet. 2014 Mar;46(3):310-5. doi: 10.1038/ng.2892. Epub 2014 Feb 2.
9
The NIH Roadmap Epigenomics Mapping Consortium.美国国立卫生研究院(NIH)路线图表观基因组学图谱联盟。
Nat Biotechnol. 2010 Oct;28(10):1045-8. doi: 10.1038/nbt1010-1045.
10
Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.全基因组关联位点对人类疾病和性状的潜在病因学及功能影响。
Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7. doi: 10.1073/pnas.0903103106. Epub 2009 May 27.