• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

鬣狗圈:一种基于鬣狗DNA的预训练大型语言模型,用于长链染色体外环状DNA预测。

HyenaCircle: a HyenaDNA-based pretrained large language model for long eccDNA prediction.

作者信息

Li Fuyu, Lu Wenxiang, Bai Yunfei

机构信息

State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.

出版信息

Front Genet. 2025 Jun 26;16:1641162. doi: 10.3389/fgene.2025.1641162. eCollection 2025.

DOI:10.3389/fgene.2025.1641162
PMID:40641599
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12240936/
Abstract

INTRODUCTION

Extrachromosomal circular DNA (eccDNA) represents a class of circular DNA molecules derived from chromosomes with diverse roles in disease. Long eccDNAs (typically 1-5 kb) pose detection challenges due to their large size, hindering functional studies. We propose HyenaCircle, a novel deep learning model leveraging large language model and third-generation sequencing data to predict long eccDNA formation.

METHODS

Full-length eccDNAs within 1-5 kb were identified by FLED algorithm for Nanopore sequencing data, extended by 100-bp flanking sequences, and paired with 20,000 length-matched negative controls from eccDNA-depleted genomic regions. HyenaCircle was built by adapting the pretrained HyenaDNA model with a designed classifier head. The strategies of data augmentation, regularization and class imbalance weighting were applied to increase model robustness.

RESULTS

HyenaCircle achieved comparable performance with a validation AUROC of 0.715 and recall of 0.776. It surpassed DNABERT by 5.9% in AUROC and demonstrated stable convergence. Hyperparameter optimization confirmed batch size 16 and learning rate 5 × 10 as optimal. The ablation studies revealed flanking sequences are important, as their removal reduced model stability. The model also showed superior stability over the baseline HyenaDNA architecture.

CONCLUSION

HyenaCircle integrated third-generation sequencing data and large language model for long eccDNA prediction, which outperformed the existing model. Our work demonstrates that the HyenaDNA architecture enables effective long-sequence genomic modeling and provides a new insight for eccDNA prediction and identification.

摘要

引言

染色体外环状DNA(eccDNA)是一类源自染色体的环状DNA分子,在疾病中具有多种作用。长eccDNA(通常为1-5 kb)因其尺寸较大,给检测带来挑战,阻碍了功能研究。我们提出了HyenaCircle,这是一种利用大语言模型和第三代测序数据来预测长eccDNA形成的新型深度学习模型。

方法

通过FLED算法从纳米孔测序数据中鉴定出1-5 kb范围内的全长eccDNA,将其侧翼序列扩展100 bp,并与来自eccDNA缺失基因组区域的20,000个长度匹配的阴性对照配对。HyenaCircle是通过调整预训练的HyenaDNA模型并设计分类器头构建而成。应用数据增强、正则化和类不平衡加权策略来提高模型的鲁棒性。

结果

HyenaCircle取得了相当的性能,验证AUROC为0.715,召回率为0.776。其在AUROC上比DNABERT高出5.9%,并显示出稳定的收敛性。超参数优化确定批量大小为16,学习率为5×10为最优。消融研究表明侧翼序列很重要,去除它们会降低模型稳定性。该模型在基线HyenaDNA架构上也表现出卓越稳定性。

结论

HyenaCircle整合第三代测序数据和大语言模型用于长eccDNA预测,优于现有模型。我们的工作表明HyenaDNA架构能够实现有效的长序列基因组建模,并为eccDNA预测和鉴定提供了新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f79/12240936/b38589bee6ad/fgene-16-1641162-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f79/12240936/17fec3c189fc/fgene-16-1641162-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f79/12240936/4a8ba562bf07/fgene-16-1641162-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f79/12240936/4cb3e20cb78e/fgene-16-1641162-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f79/12240936/b38589bee6ad/fgene-16-1641162-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f79/12240936/17fec3c189fc/fgene-16-1641162-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f79/12240936/4a8ba562bf07/fgene-16-1641162-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f79/12240936/4cb3e20cb78e/fgene-16-1641162-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f79/12240936/b38589bee6ad/fgene-16-1641162-g004.jpg

相似文献

1
HyenaCircle: a HyenaDNA-based pretrained large language model for long eccDNA prediction.鬣狗圈:一种基于鬣狗DNA的预训练大型语言模型,用于长链染色体外环状DNA预测。
Front Genet. 2025 Jun 26;16:1641162. doi: 10.3389/fgene.2025.1641162. eCollection 2025.
2
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
3
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
4
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
5
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.
7
Analysis of multiple-herbicide resistant Amaranthus palmeri populations from Spain points to an introduction of the eccDNA from America.对西班牙抗多种除草剂的帕尔默苋种群的分析表明,环状染色体外DNA是从美国传入的。
Pest Manag Sci. 2025 Jul 10. doi: 10.1002/ps.70034.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
9
Sexual Harassment and Prevention Training性骚扰与预防培训
10
Does Augmenting Irradiated Autografts With Free Vascularized Fibula Graft in Patients With Bone Loss From a Malignant Tumor Achieve Union, Function, and Complication Rate Comparably to Patients Without Bone Loss and Augmentation When Reconstructing Intercalary Resections in the Lower Extremity?对于因恶性肿瘤导致骨缺损的患者,在重建下肢节段性切除时,采用带血管游离腓骨移植来增强照射后的自体骨移植,其骨愈合、功能及并发症发生率与无骨缺损且未进行增强的患者相比是否相当?
Clin Orthop Relat Res. 2025 Jun 26. doi: 10.1097/CORR.0000000000003599.

本文引用的文献

1
Molecular mechanisms of extrachromosomal circular DNA formation.染色体外环状DNA形成的分子机制。
Nucleic Acids Res. 2025 Feb 27;53(5). doi: 10.1093/nar/gkaf122.
2
Circle-map profiling of extrachromosomal circular DNA as diagnostic biomarkers for lung cancer.作为肺癌诊断生物标志物的染色体外环状DNA的环形图分析
Precis Clin Med. 2024 Mar 22;7(1):pbae006. doi: 10.1093/pcmedi/pbae006. eCollection 2024 Mar.
3
Machine learning-based extrachromosomal DNA identification in large-scale cohorts reveals its clinical implications in cancer.
基于机器学习在大规模队列中鉴定染色体外DNA揭示了其在癌症中的临床意义。
Nat Commun. 2024 Feb 19;15(1):1515. doi: 10.1038/s41467-024-45479-6.
4
Circular extrachromosomal DNA promotes tumor heterogeneity in high-risk medulloblastoma.环状染色体外 DNA 促进高危型髓母细胞瘤的肿瘤异质性。
Nat Genet. 2023 Dec;55(12):2189-2199. doi: 10.1038/s41588-023-01551-3. Epub 2023 Nov 9.
5
FLED: a full-length eccDNA detector for long-reads sequencing data.FLED:一种用于长读测序数据的全长 eccDNA 检测器。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad388.
6
Small extrachromosomal circular DNAs as biomarkers for multi-cancer diagnosis and monitoring.小型染色体外环状 DNA 作为多癌种诊断和监测的生物标志物。
Clin Transl Med. 2023 Sep;13(9):e1393. doi: 10.1002/ctm2.1393.
7
Short human eccDNAs are predictable from sequences.短的人类 eccDNA 可从序列中预测。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad147.
8
Extrachromosomal circular DNA in colorectal cancer: biogenesis, function and potential as therapeutic target.结直肠癌中的染色体外环状 DNA:发生机制、功能和作为治疗靶点的潜力。
Oncogene. 2023 Mar;42(13):941-951. doi: 10.1038/s41388-023-02640-7. Epub 2023 Mar 1.
9
Purification, full-length sequencing and genomic origin mapping of eccDNA.环状 DNA(eccDNA)的纯化、全长测序和基因组起源定位。
Nat Protoc. 2023 Mar;18(3):683-699. doi: 10.1038/s41596-022-00783-7. Epub 2022 Dec 14.
10
iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations.iDNA-ABF:用于可解释的 DNA 甲基化预测的多尺度深度生物语言学习模型。
Genome Biol. 2022 Oct 17;23(1):219. doi: 10.1186/s13059-022-02780-1.