• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

学园:学习在低覆盖度古代基因组上识别拷贝数变异

LYCEUM: learning to call copy number variants on low-coverage ancient genomes.

作者信息

Yılmaz Mehmet Alper, Ceylan Ahmet Arda, Kaynar Gun, Çiçek A Ercüment

机构信息

Department of Computer Engineering, Bilkent University, Ankara 06800, Türkiye.

Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 06800, United States.

出版信息

Bioinformatics. 2025 Jul 1;41(Supplement_1):i285-i293. doi: 10.1093/bioinformatics/btaf244.

DOI:10.1093/bioinformatics/btaf244
PMID:40662803
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12261418/
Abstract

MOTIVATION

Copy number variants (CNVs) are pivotal in driving phenotypic variation that facilitates species adaptation. They are significant contributors to various disorders, making ancient genomes crucial for uncovering the genetic origins of disease susceptibility across populations. However, detecting CNVs in ancient DNA (aDNA) samples poses substantial challenges due to several factors: (i) aDNA is often highly degraded; (ii) contamination from microbial DNA and DNA from closely related species introduces additional noise into sequencing data; and finally, (iii) the typically low-coverage of aDNA renders accurate CNV detection particularly difficult. Conventional CNV calling algorithms, which are optimized for high-coverage read-depth signals, underperform under such conditions.

RESULTS

To address these limitations, we introduce LYCEUM, the first machine learning-based CNV caller for aDNA. To overcome challenges related to data quality and scarcity, we employ a two-step training strategy. First, the model is pre-trained on whole genome sequencing data from the 1000 Genomes Project, teaching it CNV-calling capabilities similar to conventional methods. Next, the model is fine-tuned using high-confidence CNV calls derived from only a few existing high-coverage aDNA samples. During this stage, the model adapts to making CNV calls based on the downsampled read depth signals of the same aDNA samples. LYCEUM achieves accurate detection of CNVs even in typically low-coverage ancient genomes. We also observe that the segmental deletion calls made by LYCEUM show correlation with the demographic history of the samples and exhibit patterns of negative selection inline with natural selection.

AVAILABILITY AND IMPLEMENTATION

LYCEUM is available at https://github.com/ciceklab/LYCEUM.

摘要

动机

拷贝数变异(CNV)在驱动促进物种适应的表型变异中起着关键作用。它们是导致各种疾病的重要因素,使得古代基因组对于揭示不同人群疾病易感性的遗传起源至关重要。然而,由于以下几个因素,在古代DNA(aDNA)样本中检测CNV面临重大挑战:(i)aDNA通常高度降解;(ii)微生物DNA和来自密切相关物种的DNA污染会给测序数据引入额外噪声;最后,(iii)aDNA通常的低覆盖率使得准确检测CNV特别困难。针对高覆盖率读深度信号进行优化的传统CNV检测算法在这种情况下表现不佳。

结果

为了解决这些限制,我们引入了LYCEUM,这是首个基于机器学习的aDNA CNV检测工具。为了克服与数据质量和稀缺性相关的挑战,我们采用了两步训练策略。首先,该模型在来自千人基因组计划的全基因组测序数据上进行预训练,使其具备与传统方法类似的CNV检测能力。接下来,使用仅从少数现有的高覆盖率aDNA样本中获得的高置信度CNV调用对模型进行微调。在此阶段,模型适应基于相同aDNA样本的下采样读深度信号进行CNV调用。即使在通常低覆盖率的古代基因组中,LYCEUM也能准确检测CNV。我们还观察到,LYCEUM做出的片段缺失调用与样本的人口历史相关,并呈现出与自然选择一致的负选择模式。

可用性和实现方式

LYCEUM可在https://github.com/ciceklab/LYCEUM获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/950eb8cd2974/btaf244f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/8f7f25c0da80/btaf244f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/c8c1fce2bd70/btaf244f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/57f96888b9d0/btaf244f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/249871b669a4/btaf244f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/877eb009ad32/btaf244f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/950eb8cd2974/btaf244f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/8f7f25c0da80/btaf244f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/c8c1fce2bd70/btaf244f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/57f96888b9d0/btaf244f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/249871b669a4/btaf244f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/877eb009ad32/btaf244f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147a/12261418/950eb8cd2974/btaf244f6.jpg

相似文献

1
LYCEUM: learning to call copy number variants on low-coverage ancient genomes.学园:学习在低覆盖度古代基因组上识别拷贝数变异
Bioinformatics. 2025 Jul 1;41(Supplement_1):i285-i293. doi: 10.1093/bioinformatics/btaf244.
2
Short-Term Memory Impairment短期记忆障碍
3
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
4
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
5
Perceptions and experiences of the prevention, detection, and management of postpartum haemorrhage: a qualitative evidence synthesis.预防、检测和管理产后出血的认知和经验:定性证据综合。
Cochrane Database Syst Rev. 2023 Nov 27;11(11):CD013795. doi: 10.1002/14651858.CD013795.pub2.
6
Sexual Harassment and Prevention Training性骚扰与预防培训
7
123I-MIBG scintigraphy and 18F-FDG-PET imaging for diagnosing neuroblastoma.用于诊断神经母细胞瘤的123I-间碘苄胍闪烁扫描术和18F-氟代脱氧葡萄糖正电子发射断层显像
Cochrane Database Syst Rev. 2015 Sep 29;2015(9):CD009263. doi: 10.1002/14651858.CD009263.pub2.
8
Measures implemented in the school setting to contain the COVID-19 pandemic.学校为控制 COVID-19 疫情而采取的措施。
Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029.
9
Community views on mass drug administration for soil-transmitted helminths: a qualitative evidence synthesis.社区对土壤传播蠕虫群体药物给药的看法:定性证据综合分析
Cochrane Database Syst Rev. 2025 Jun 20;6:CD015794. doi: 10.1002/14651858.CD015794.pub2.
10
Systemic Inflammatory Response Syndrome全身炎症反应综合征

本文引用的文献

1
SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads.SurVIndel2:利用隐藏的分割读段改进来自下一代测序的拷贝数变异检测
Nat Commun. 2024 Dec 2;15(1):10473. doi: 10.1038/s41467-024-53087-7.
2
Comprehensive genome analysis and variant detection at scale using DRAGEN.使用DRAGEN进行大规模的全基因组分析和变异检测。
Nat Biotechnol. 2024 Oct 25. doi: 10.1038/s41587-024-02382-1.
3
Impact and characterization of serial structural variations across humans and great apes.人类和大型类人猿中连续结构变异的影响和特征。
Nat Commun. 2024 Sep 13;15(1):8007. doi: 10.1038/s41467-024-52027-9.
4
Population genomics of post-glacial western Eurasia.后冰河时代的西欧人口基因组学。
Nature. 2024 Jan;625(7994):301-311. doi: 10.1038/s41586-023-06865-0. Epub 2024 Jan 10.
5
100 ancient genomes show repeated population turnovers in Neolithic Denmark.100 个古代基因组显示新石器时代丹麦人口的反复更替。
Nature. 2024 Jan;625(7994):329-337. doi: 10.1038/s41586-023-06862-3. Epub 2024 Jan 10.
6
Rare copy-number variants as modulators of common disease susceptibility.罕见的拷贝数变异作为常见疾病易感性的调节因子。
Genome Med. 2024 Jan 8;16(1):5. doi: 10.1186/s13073-023-01265-5.
7
ECOLE: Learning to call copy number variants on whole exome sequencing data.ECOLE:学习在全外显子组测序数据上调用拷贝数变异。
Nat Commun. 2024 Jan 2;15(1):132. doi: 10.1038/s41467-023-44116-y.
8
Imputation of ancient human genomes.古代人类基因组的推断。
Nat Commun. 2023 Jun 20;14(1):3660. doi: 10.1038/s41467-023-39202-0.
9
CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data.CONGA:古基因组和低覆盖度测序数据中的拷贝数变异基因分型。
PLoS Comput Biol. 2022 Dec 14;18(12):e1010788. doi: 10.1371/journal.pcbi.1010788. eCollection 2022 Dec.
10
High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.对扩展的 1000 基因组项目队列进行高覆盖率全基因组测序,包括 602 个三核苷酸重复序列。
Cell. 2022 Sep 1;185(18):3426-3440.e19. doi: 10.1016/j.cell.2022.08.004.