Suppr超能文献

大型语言模型通过游离 DNA 的末端基序图谱对癌症进行高精度诊断。

Large language model produces high accurate diagnosis of cancer from end-motif profiles of cell-free DNA.

机构信息

Key Laboratory of Cancer Prevention and Therapy, Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin Medical University, Tianjin, 300060, China.

Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin Medical University, Tianjin, 300060, China.

出版信息

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae430.

Abstract

Instruction-tuned large language models (LLMs) demonstrate exceptional ability to align with human intentions. We present an LLM-based model-instruction-tuned LLM for assessment of cancer (iLLMAC)-that can detect cancer using cell-free deoxyribonucleic acid (cfDNA) end-motif profiles. Developed on plasma cfDNA sequencing data from 1135 cancer patients and 1106 controls across three datasets, iLLMAC achieved area under the receiver operating curve (AUROC) of 0.866 [95% confidence interval (CI), 0.773-0.959] for cancer diagnosis and 0.924 (95% CI, 0.841-1.0) for hepatocellular carcinoma (HCC) detection using 16 end-motifs. Performance increased with more motifs, reaching 0.886 (95% CI, 0.794-0.977) and 0.956 (95% CI, 0.89-1.0) for cancer diagnosis and HCC detection, respectively, with 64 end-motifs. On an external-testing set, iLLMAC achieved AUROC of 0.912 (95% CI, 0.849-0.976) for cancer diagnosis and 0.938 (95% CI, 0.885-0.992) for HCC detection with 64 end-motifs, significantly outperforming benchmarked methods. Furthermore, iLLMAC achieved high classification performance on datasets with bisulfite and 5-hydroxymethylcytosine sequencing. Our study highlights the effectiveness of LLM-based instruction-tuning for cfDNA-based cancer detection.

摘要

指令调优的大型语言模型(LLM)表现出与人类意图高度一致的能力。我们提出了一种基于 LLM 的模型-指令调优的 LLM,用于评估癌症(iLLMAC)-它可以使用无细胞脱氧核糖核酸(cfDNA)末端基序谱来检测癌症。该模型在三个数据集的 1135 名癌症患者和 1106 名对照的血浆 cfDNA 测序数据上进行了开发,iLLMAC 在癌症诊断方面的曲线下面积(AUROC)为 0.866[95%置信区间(CI),0.773-0.959],使用 16 个末端基序检测肝癌(HCC)的 AUROC 为 0.924[95%CI,0.841-1.0]。随着基序数量的增加,性能有所提高,使用 64 个末端基序时,癌症诊断和 HCC 检测的 AUROC 分别达到 0.886[95%CI,0.794-0.977]和 0.956[95%CI,0.89-1.0]。在外部测试集上,iLLMAC 在癌症诊断方面的 AUROC 为 0.912[95%CI,0.849-0.976],在 HCC 检测方面的 AUROC 为 0.938[95%CI,0.885-0.992],使用 64 个末端基序,显著优于基准方法。此外,iLLMAC 在使用亚硫酸氢盐和 5-羟甲基胞嘧啶测序的数据集上实现了高分类性能。我们的研究强调了基于 LLM 的指令调优在基于 cfDNA 的癌症检测中的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4716/11367762/4d704bdbdac4/bbae430f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验