文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于自监督对比学习预测人类致病起始缺失变异体。

Prediction of human pathogenic start loss variants based on self-supervised contrastive learning.

作者信息

Liu Jie, Fan Henghui, Cheng Na, Su Yansen, Xia Junfeng

机构信息

Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.

School of Biomedical Engineering, Anhui Medical University, Hefei, 230032, Anhui, China.

出版信息

BMC Biol. 2025 Aug 8;23(1):250. doi: 10.1186/s12915-025-02348-y.


DOI:10.1186/s12915-025-02348-y
PMID:40781627
Abstract

BACKGROUND: Start loss variants are a class of genetic variants that affect the bases of the start codon, disrupting the normal translation initiation process and leading to protein deletions or the production of different proteins. Accurate assessment of the pathogenicity of these variants is crucial for deciphering disease mechanisms and integrating genomics into clinical practice. However, among the tens of thousands of start loss variants in the human genome, only about 1% have been classified as pathogenic or benign. Computational methods that rely solely on small amounts of labeled data often lack sufficient generalization capabilities, restricting their effectiveness in predicting the impact of start loss variants. RESULTS: Here, we introduce StartCLR, a novel prediction method specifically designed for identifying pathogenic start loss variants. StartCLR captures variant context information from different dimensions by integrating embedding features from diverse DNA language models. Moreover, it employs self-supervised pre-training combined with supervised fine-tuning, enabling the effective utilization of both a large amount of unlabeled data and a small amount of labeled data to enhance prediction accuracy. Our experimental results show that StartCLR exhibits strong generalization and superior prediction performance across different test sets. Notably, when trained exclusively on high-confidence labeled data, StartCLR retains or even improves the prediction accuracy despite the reduced amount of labeled data. CONCLUSIONS: Collectively, these findings highlight the potential of integrating self-supervised contrastive learning with unlabeled data to mitigate the challenge posed by the scarcity of labeled start loss variants.

摘要

背景:起始密码子缺失变异是一类影响起始密码子碱基的基因变异,会破坏正常的翻译起始过程,导致蛋白质缺失或产生不同的蛋白质。准确评估这些变异的致病性对于解读疾病机制以及将基因组学应用于临床实践至关重要。然而,在人类基因组中成千上万的起始密码子缺失变异中,只有约1%被分类为致病或良性。仅依赖少量标记数据的计算方法往往缺乏足够的泛化能力,限制了它们在预测起始密码子缺失变异影响方面的有效性。 结果:在此,我们介绍了StartCLR,这是一种专门设计用于识别致病性起始密码子缺失变异的新型预测方法。StartCLR通过整合来自不同DNA语言模型的嵌入特征,从不同维度捕获变异上下文信息。此外,它采用自监督预训练与监督微调相结合的方式,能够有效利用大量未标记数据和少量标记数据来提高预测准确性。我们的实验结果表明,StartCLR在不同测试集上表现出强大的泛化能力和卓越的预测性能。值得注意的是,当仅在高置信度标记数据上进行训练时,尽管标记数据量减少,StartCLR仍保持甚至提高了预测准确性。 结论:总体而言,这些发现凸显了将自监督对比学习与未标记数据相结合以应对起始密码子缺失变异标记数据稀缺所带来挑战的潜力。

相似文献

[1]
Prediction of human pathogenic start loss variants based on self-supervised contrastive learning.

BMC Biol. 2025-8-8

[2]
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.

JMIR Med Inform. 2025-6-4

[3]
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024-12-1

[4]
Self-Supervised Contrastive Learning on Attribute and Topology Graphs for Predicting Relationships Among lncRNAs, miRNAs and Diseases.

IEEE J Biomed Health Inform. 2025-1

[5]
Boundary-aware information maximization for self-supervised medical image segmentation.

Med Image Anal. 2024-5

[6]
An augmented transformer model trained on protein family specific variant data leads to improved prediction of variants of uncertain significance.

Hum Genet. 2025-3

[7]
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.

Syst Rev. 2024-11-26

[8]
A segment anything model-guided and match-based semi-supervised segmentation framework for medical imaging.

Med Phys. 2025-3-29

[9]
Semi-supervised semantic segmentation of cell nuclei with diffusion model and collaborative learning.

J Med Imaging (Bellingham). 2025-11

[10]
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?

Clin Orthop Relat Res. 2024-9-1

本文引用的文献

[1]
A DNA language model based on multispecies alignment predicts the effects of genome-wide variants.

Nat Biotechnol. 2025-1-2

[2]
Self-distillation improves self-supervised learning for DNA sequence inference.

Neural Netw. 2025-3

[3]
Foundation models for fast, label-free detection of glioma infiltration.

Nature. 2025-1

[4]
A long-context language model for deciphering and generating bacteriophage genomes.

Nat Commun. 2024-10-30

[5]
Interpretable Dynamic Directed Graph Convolutional Network for Multi-Relational Prediction of Missense Mutation and Drug Response.

IEEE J Biomed Health Inform. 2025-2

[6]
Deciphering the impact of genomic variation on function.

Nature. 2024-9

[7]
Clustered de novo start-loss variants in GLUL result in a developmental and epileptic encephalopathy via stabilization of glutamine synthetase.

Am J Hum Genet. 2024-4-4

[8]
Protein translation: biological processes and therapeutic strategies for human diseases.

Signal Transduct Target Ther. 2024-2-23

[9]
CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions.

Nucleic Acids Res. 2024-1-5

[10]
Accurate proteome-wide missense variant effect prediction with AlphaMissense.

Science. 2023-9-22

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索