Suppr超能文献

CYTO-SV-ML:一种利用基因组序列对体细胞类型进行细胞遗传学结构变异分析的机器学习工具。

CYTO-SV-ML: A Machine Learning Tool for Cytogenetic Structural Variant Analysis in Somatic Cell Type Using Genome Sequences.

作者信息

Zhang Tao, Auer Paul, Spellman Stephen R, Dong Jing, Saber Wael, Bolon Yung-Tsi

机构信息

CIBMTR® (Center for International Blood and Marrow Transplant Research), NMDP (National Marrow Donor Program), Minneapolis, MN 55401, USA.

Division of Biostatistics, Institute for Health and Equity, Medical College of Wisconsin, Milwaukee, WI 53226, USA.

出版信息

Life (Basel). 2025 Jun 9;15(6):929. doi: 10.3390/life15060929.

Abstract

(1) Background: Although whole genome sequencing (WGS) has enabled the comprehensive analyses of structural variants (SVs), more accurate and efficient methods are needed to distinguish large somatic SVs (SV size ≥ 1 Mb) traditionally detected through cytogenetic testing from germline SVs. (2) Methods: A customized machine learning pipeline (CYTO-SV-ML) under Snakemake automation workflow was developed with a user interface to identify somatic cytogenetic SVs in WGS data. And this tool was applied for characterizing structural variation profiles in the whole blood of patients with myelodysplastic syndromes (MDSs). Known SVs mapped from well-established open databases were split into training and validation subsets for an AUTO-ML machine learning model in a CYTO-SV-ML pipeline. (3) Results: The benchmarking performance of the CYTO-SV-ML pipeline on somatic cytogenetic SV classification displayed an area under the receiver operating characteristic curve (AUCROC) of 0.94 for translocations and 0.92 for non-translocations, a sensitivity of 0.83 for translocations and 0.85 for non-translocations, and a specificity of 0.96 for translocations and 0.82 for non-translocations. Our method (207 somatic cytogenetic SVs) outperformed a conventional SV calling pipeline (143 somatic cytogenetic SVs) in an independent validation of clinical cytogenetic records. In addition, the CYTO-SV-ML pipeline uncovered novel somatic cytogenetic SVs in 49 (89%) of 55 patients without successful clinical cytogenetic results. (4) Conclusions: Our study demonstrates the high-performance machine learning approach of CYTO-SV-ML on benchmarking SV classification from genomic sequencing data, and further validations of novel anomalies by orthogonal methods will be essential to unlock its full clinical potential of cytogenetic diagnostics.

摘要

(1) 背景:尽管全基因组测序(WGS)已能够对结构变异(SVs)进行全面分析,但仍需要更准确、高效的方法来区分传统上通过细胞遗传学检测发现的大型体细胞SVs(SV大小≥1 Mb)和种系SVs。(2) 方法:在Snakemake自动化工作流程下开发了一个定制的机器学习管道(CYTO-SV-ML),该管道带有用户界面,用于识别WGS数据中的体细胞细胞遗传学SVs。该工具被应用于表征骨髓增生异常综合征(MDSs)患者全血中的结构变异图谱。从成熟的开放数据库映射的已知SVs被分为训练子集和验证子集,用于CYTO-SV-ML管道中的自动机器学习模型。(3) 结果:CYTO-SV-ML管道在体细胞细胞遗传学SV分类方面的基准性能显示,易位的受试者工作特征曲线下面积(AUCROC)为0.94,非易位的为0.92;易位的敏感性为0.83,非易位的为0.85;易位的特异性为0.96,非易位的为0.82。在临床细胞遗传学记录的独立验证中,我们的方法(207个体细胞细胞遗传学SVs)优于传统的SV检测管道(143个体细胞细胞遗传学SVs)。此外,CYTO-SV-ML管道在55例临床细胞遗传学结果未成功的患者中的49例(89%)中发现了新的体细胞细胞遗传学SVs。(4) 结论:我们的研究证明了CYTO-SV-ML在从基因组测序数据进行SV分类基准测试方面的高性能机器学习方法,通过正交方法对新异常进行进一步验证对于释放其细胞遗传学诊断的全部临床潜力至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe9/12194788/b0a739c16e57/life-15-00929-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验