• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

araCNA:使用长程序列模型进行体细胞拷贝数分析

araCNA: somatic copy number profiling using long-range sequence models.

作者信息

Visscher Ellen, Yau Christopher

机构信息

Nuffield Department for Women's & Reproductive Health, University of Oxford, Women's Centre, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom.

出版信息

NAR Genom Bioinform. 2025 Sep 9;7(3):lqaf124. doi: 10.1093/nargab/lqaf124. eCollection 2025 Sep.

DOI:10.1093/nargab/lqaf124
PMID:40933674
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12418177/
Abstract

Somatic copy number alterations (CNAs) are hallmarks of cancer. Current algorithms that call CNAs from whole-genome sequenced (WGS) data have not exploited deep learning methods owing to computational scaling limitations. Here, we present a novel deep-learning approach, araCNA, trained only on simulated data that can accurately predict CNAs in real WGS cancer genomes. araCNA uses novel transformer alternatives (e.g. Mamba) to handle genomic-scale sequence lengths (∼1M) and learn long-range interactions. Results are extremely accurate on simulated data, and this zero-shot approach is on par with existing methods when applied to 50 WGS samples from the Cancer Genome Atlas. Notably, our approach requires only a tumour sample and not a matched normal sample, has fewer markers of overfitting, and performs inference in only a few minutes. araCNA demonstrates how domain knowledge can be used to simulate training sets that harness the power of modern machine learning in biological applications.

摘要

体细胞拷贝数改变(CNA)是癌症的标志。目前从全基因组测序(WGS)数据中识别CNA的算法由于计算规模限制尚未采用深度学习方法。在此,我们提出一种新颖的深度学习方法araCNA,该方法仅在模拟数据上进行训练,能够准确预测真实WGS癌症基因组中的CNA。araCNA使用新颖的变换器替代方案(如Mamba)来处理基因组规模的序列长度(约100万个碱基对)并学习长程相互作用。该方法在模拟数据上的结果极其准确,并且这种零样本方法在应用于来自癌症基因组图谱的50个WGS样本时与现有方法相当。值得注意的是,我们的方法仅需要肿瘤样本而不需要匹配的正常样本,过拟合标记更少,并且仅需几分钟即可完成推理。araCNA展示了如何利用领域知识来模拟训练集,从而在生物应用中发挥现代机器学习的强大功能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/9fb47fc70eba/lqaf124fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/56846e805559/lqaf124fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/5750cb518af0/lqaf124fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/f9e2b4b4ad50/lqaf124fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/e13441b6d012/lqaf124fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/e624c5c20eda/lqaf124fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/339785711032/lqaf124fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/9fb47fc70eba/lqaf124fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/56846e805559/lqaf124fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/5750cb518af0/lqaf124fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/f9e2b4b4ad50/lqaf124fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/e13441b6d012/lqaf124fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/e624c5c20eda/lqaf124fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/339785711032/lqaf124fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8046/12418177/9fb47fc70eba/lqaf124fig7.jpg

相似文献

1
araCNA: somatic copy number profiling using long-range sequence models.araCNA:使用长程序列模型进行体细胞拷贝数分析
NAR Genom Bioinform. 2025 Sep 9;7(3):lqaf124. doi: 10.1093/nargab/lqaf124. eCollection 2025 Sep.
2
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
5
Short-Term Memory Impairment短期记忆障碍
6
Plug-and-play use of tree-based methods: consequences for clinical prediction modeling.基于树的方法的即插即用:对临床预测模型的影响。
J Clin Epidemiol. 2025 Aug;184:111834. doi: 10.1016/j.jclinepi.2025.111834. Epub 2025 May 19.
7
Comprehensive mutational profiling identifies new driver events in cutaneous leiomyosarcoma.全面的突变分析确定了皮肤平滑肌肉瘤中的新驱动事件。
Br J Dermatol. 2025 Jan 24;192(2):335-343. doi: 10.1093/bjd/ljae386.
8
Cognitive decline assessment using semantic linguistic content and transformer deep learning architecture.使用语义语言内容和变压器深度学习架构评估认知能力下降。
Int J Lang Commun Disord. 2024 May-Jun;59(3):1110-1127. doi: 10.1111/1460-6984.12973. Epub 2023 Nov 16.
9
CNPI: Rapid Analyses of Human Copy Number Data.
J Mol Biol. 2025 Oct 1;437(19):169313. doi: 10.1016/j.jmb.2025.169313. Epub 2025 Jun 28.
10
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

本文引用的文献

1
CNRein: an evolution-aware deep reinforcement learning algorithm for single-cell DNA copy number calling.CNRein:一种用于单细胞DNA拷贝数检测的进化感知深度强化学习算法。
Genome Biol. 2025 Apr 7;26(1):87. doi: 10.1186/s13059-025-03553-2.
2
Transformers in single-cell omics: a review and new perspectives.单细胞组学中的转换器:综述与新视角。
Nat Methods. 2024 Aug;21(8):1430-1443. doi: 10.1038/s41592-024-02353-z. Epub 2024 Aug 9.
3
Evaluation of somatic copy number variation detection by NGS technologies and bioinformatics tools on a hyper-diploid cancer genome.
评估高通量测序技术和生物信息学工具在超二倍体癌症基因组上对体细胞拷贝数变异的检测。
Genome Biol. 2024 Jun 20;25(1):163. doi: 10.1186/s13059-024-03294-8.
4
HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data.HATCHet2:基于批量肿瘤测序数据的克隆和单体型特异性拷贝数推断。
Genome Biol. 2024 May 21;25(1):130. doi: 10.1186/s13059-024-03267-x.
5
A comprehensive review of deep learning-based variant calling methods.深度学习变异calling 方法的全面综述。
Brief Funct Genomics. 2024 Jul 19;23(4):303-313. doi: 10.1093/bfgp/elae003.
6
ECOLE: Learning to call copy number variants on whole exome sequencing data.ECOLE:学习在全外显子组测序数据上调用拷贝数变异。
Nat Commun. 2024 Jan 2;15(1):132. doi: 10.1038/s41467-023-44116-y.
7
The evolution of non-small cell lung cancer metastases in TRACERx.TRACERx 中观察到的非小细胞肺癌转移演变。
Nature. 2023 Apr;616(7957):534-542. doi: 10.1038/s41586-023-05729-x. Epub 2023 Apr 12.
8
Accurate somatic variant detection using weakly supervised deep learning.利用弱监督深度学习进行准确的体细胞变异检测。
Nat Commun. 2022 Jul 22;13(1):4248. doi: 10.1038/s41467-022-31765-8.
9
Signatures of copy number alterations in human cancer.人类癌症中拷贝数改变的特征。
Nature. 2022 Jun;606(7916):984-991. doi: 10.1038/s41586-022-04738-6. Epub 2022 Jun 15.
10
Chromothripsis followed by circular recombination drives oncogene amplification in human cancer.染色体碎裂后伴随环状重组驱动人类癌症中的癌基因扩增。
Nat Genet. 2021 Dec;53(12):1673-1685. doi: 10.1038/s41588-021-00951-7. Epub 2021 Nov 15.