大型语言模型通过游离 DNA 的末端基序图谱对癌症进行高精度诊断。

Large language model produces high accurate diagnosis of cancer from end-motif profiles of cell-free DNA.

机构信息

Key Laboratory of Cancer Prevention and Therapy, Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin Medical University, Tianjin, 300060, China.

Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin Medical University, Tianjin, 300060, China.

出版信息

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae430.

DOI:10.1093/bib/bbae430

PMID:39222060

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11367762/

Abstract

Instruction-tuned large language models (LLMs) demonstrate exceptional ability to align with human intentions. We present an LLM-based model-instruction-tuned LLM for assessment of cancer (iLLMAC)-that can detect cancer using cell-free deoxyribonucleic acid (cfDNA) end-motif profiles. Developed on plasma cfDNA sequencing data from 1135 cancer patients and 1106 controls across three datasets, iLLMAC achieved area under the receiver operating curve (AUROC) of 0.866 [95% confidence interval (CI), 0.773-0.959] for cancer diagnosis and 0.924 (95% CI, 0.841-1.0) for hepatocellular carcinoma (HCC) detection using 16 end-motifs. Performance increased with more motifs, reaching 0.886 (95% CI, 0.794-0.977) and 0.956 (95% CI, 0.89-1.0) for cancer diagnosis and HCC detection, respectively, with 64 end-motifs. On an external-testing set, iLLMAC achieved AUROC of 0.912 (95% CI, 0.849-0.976) for cancer diagnosis and 0.938 (95% CI, 0.885-0.992) for HCC detection with 64 end-motifs, significantly outperforming benchmarked methods. Furthermore, iLLMAC achieved high classification performance on datasets with bisulfite and 5-hydroxymethylcytosine sequencing. Our study highlights the effectiveness of LLM-based instruction-tuning for cfDNA-based cancer detection.

摘要

指令调优的大型语言模型（LLM）表现出与人类意图高度一致的能力。我们提出了一种基于 LLM 的模型-指令调优的 LLM，用于评估癌症（iLLMAC）-它可以使用无细胞脱氧核糖核酸（cfDNA）末端基序谱来检测癌症。该模型在三个数据集的 1135 名癌症患者和 1106 名对照的血浆 cfDNA 测序数据上进行了开发，iLLMAC 在癌症诊断方面的曲线下面积（AUROC）为 0.866[95%置信区间（CI），0.773-0.959]，使用 16 个末端基序检测肝癌（HCC）的 AUROC 为 0.924[95%CI，0.841-1.0]。随着基序数量的增加，性能有所提高，使用 64 个末端基序时，癌症诊断和 HCC 检测的 AUROC 分别达到 0.886[95%CI，0.794-0.977]和 0.956[95%CI，0.89-1.0]。在外部测试集上，iLLMAC 在癌症诊断方面的 AUROC 为 0.912[95%CI，0.849-0.976]，在 HCC 检测方面的 AUROC 为 0.938[95%CI，0.885-0.992]，使用 64 个末端基序，显著优于基准方法。此外，iLLMAC 在使用亚硫酸氢盐和 5-羟甲基胞嘧啶测序的数据集上实现了高分类性能。我们的研究强调了基于 LLM 的指令调优在基于 cfDNA 的癌症检测中的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4716/11367762/4d704bdbdac4/bbae430f1.jpg

相似文献

Large language model produces high accurate diagnosis of cancer from end-motif profiles of cell-free DNA.大型语言模型通过游离 DNA 的末端基序图谱对癌症进行高精度诊断。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae430.

Genome-wide discovery and validation of diagnostic DNA methylation-based biomarkers for hepatocellular cancer detection in circulating cell free DNA.利用循环游离 DNA 进行基于全基因组发现和验证诊断 DNA 甲基化的肝细胞癌检测的生物标志物

Theranostics. 2019 Sep 25;9(24):7239-7250. doi: 10.7150/thno.35573. eCollection 2019.

Characterization of fragment sizes, copy number aberrations and 4-mer end motifs in cell-free DNA of hepatocellular carcinoma for enhanced liquid biopsy-based cancer detection.对肝癌无细胞 DNA 的片段大小、拷贝数异常和 4 聚体末端基序进行特征分析，以增强基于液体活检的癌症检测。

Mol Oncol. 2021 Sep;15(9):2377-2389. doi: 10.1002/1878-0261.13041. Epub 2021 Jul 16.

Diagnostic value of circulating cell-free DNA levels for hepatocellular carcinoma.循环游离 DNA 水平对肝细胞癌的诊断价值。

Int J Infect Dis. 2018 Feb;67:92-97. doi: 10.1016/j.ijid.2017.12.002. Epub 2017 Dec 8.

Language model enables end-to-end accurate detection of cancer from cell-free DNA.语言模型可实现从游离 DNA 端到端准确检测癌症。

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae053.

Cell-free DNA methylation markers for differential diagnosis of hepatocellular carcinoma.游离细胞 DNA 甲基化标记物用于肝细胞癌的鉴别诊断。

BMC Med. 2022 Jan 14;20(1):8. doi: 10.1186/s12916-021-02201-3.

Diagnostic performance of circulating cell-free DNA for hepatocellular carcinoma: a systematic review and meta-analysis.循环游离 DNA 检测在肝细胞癌中的诊断性能：系统评价和荟萃分析。

Biomark Med. 2021 Feb;15(3):219-239. doi: 10.2217/bmm-2020-0334. Epub 2021 Jan 20.

Early detection of hepatocellular carcinoma via no end-repair enzymatic methylation sequencing of cell-free DNA and pre-trained neural network.基于无末端修复酶促甲基化测序的循环游离 DNA 和预训练神经网络早期检测肝细胞癌

Genome Med. 2023 Nov 8;15(1):93. doi: 10.1186/s13073-023-01238-8.

Plasma mSEPT9: A Novel Circulating Cell-free DNA-Based Epigenetic Biomarker to Diagnose Hepatocellular Carcinoma.血浆 mSEPT9：一种新型的基于循环无细胞 DNA 的表观遗传生物标志物，用于诊断肝细胞癌。

EBioMedicine. 2018 Apr;30:138-147. doi: 10.1016/j.ebiom.2018.03.029. Epub 2018 Mar 28.

Hypomethylation in HBV integration regions aids non-invasive surveillance to hepatocellular carcinoma by low-pass genome-wide bisulfite sequencing.HBV 整合区域的低甲基化有助于通过高通量全基因组亚硫酸氢盐测序进行肝细胞癌的非侵入性监测。

BMC Med. 2020 Aug 3;18(1):200. doi: 10.1186/s12916-020-01667-x.

引用本文的文献

A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.牙种植学中大型语言模型的多维性能评估：ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较

BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.

EM-DeepSD: A Deep Neural Network Model Based on Cell-Free DNA End-Motif Signal Decomposition for Cancer Diagnosis.EM-DeepSD：一种基于游离DNA末端基序信号分解的用于癌症诊断的深度神经网络模型。

Diagnostics (Basel). 2025 May 1;15(9):1156. doi: 10.3390/diagnostics15091156.

Comparing Artificial Intelligence-Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study.比较人工智能生成与临床医生创建的针对膝骨关节炎患者的个性化自我管理指导：盲法观察研究。

J Med Internet Res. 2025 May 7;27:e67830. doi: 10.2196/67830.

Adaptive Treatment of Metastatic Prostate Cancer Using Generative Artificial Intelligence.使用生成式人工智能对转移性前列腺癌进行适应性治疗。

Clin Med Insights Oncol. 2025 Jan 6;19:11795549241311408. doi: 10.1177/11795549241311408. eCollection 2025.

Integrated multiomics signatures to optimize the accurate diagnosis of lung cancer.整合多组学特征以优化肺癌的准确诊断。

Nat Commun. 2025 Jan 2;16(1):84. doi: 10.1038/s41467-024-55594-z.

本文引用的文献

Fragmentation landscape of cell-free DNA revealed by deconvolutional analysis of end motifs.通过末端基序去卷积分析揭示无细胞 DNA 的碎片化景观。

Proc Natl Acad Sci U S A. 2023 Apr 25;120(17):e2220982120. doi: 10.1073/pnas.2220982120. Epub 2023 Apr 19.

Foundation models for generalist medical artificial intelligence.通用型医学人工智能的基础模型。

Nature. 2023 Apr;616(7956):259-265. doi: 10.1038/s41586-023-05881-4. Epub 2023 Apr 12.

Detecting Liver Cancer Using Cell-Free DNA Fragmentomes.基于游离 DNA 片段组学检测肝癌

Cancer Discov. 2023 Mar 1;13(3):616-631. doi: 10.1158/2159-8290.CD-22-0659.

Epigenetic analysis of cell-free DNA by fragmentomic profiling.基于片段组学分析的游离 DNA 表观遗传学分析。

Proc Natl Acad Sci U S A. 2022 Nov;119(44):e2209852119. doi: 10.1073/pnas.2209852119. Epub 2022 Oct 26.

Circulating cell-free DNA for cancer early detection.用于癌症早期检测的循环游离DNA

Innovation (Camb). 2022 May 6;3(4):100259. doi: 10.1016/j.xinn.2022.100259. eCollection 2022 Jul 12.

Integrated 5-hydroxymethylcytosine and fragmentation signatures as enhanced biomarkers in lung cancer.整合 5-羟甲基胞嘧啶和碎裂特征作为肺癌的增强生物标志物。

Clin Epigenetics. 2022 Jan 24;14(1):15. doi: 10.1186/s13148-022-01233-7.

Ultrasensitive and affordable assay for early detection of primary liver cancer using plasma cell-free DNA fragmentomics.利用血浆游离 DNA 片段组学进行原发性肝癌的早期检测：一种超灵敏且经济实惠的检测方法。

Hepatology. 2022 Aug;76(2):317-329. doi: 10.1002/hep.32308. Epub 2022 Jan 26.

Detection and characterization of lung cancer using cell-free DNA fragmentomes.利用游离 DNA 片段组学检测和表征肺癌。

Nat Commun. 2021 Aug 20;12(1):5060. doi: 10.1038/s41467-021-24994-w.

Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies.液体活检中游离 DNA 的表观遗传学、片段组学和拓扑结构。

Science. 2021 Apr 9;372(6538). doi: 10.1126/science.aaw3616.

Jagged Ends of Urinary Cell-Free DNA: Characterization and Feasibility Assessment in Bladder Cancer Detection.尿游离 DNA 的不平整末端：膀胱癌检测中的特征描述和可行性评估。

Clin Chem. 2021 Mar 31;67(4):621-630. doi: 10.1093/clinchem/hvaa325.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

大型语言模型通过游离 DNA 的末端基序图谱对癌症进行高精度诊断。

Large language model produces high accurate diagnosis of cancer from end-motif profiles of cell-free DNA.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献