Suppr超能文献

从基因模块到基因标志物:一种人工智能与人类相结合的方法筛选出CD38来代表浆细胞相关转录特征。

From gene modules to gene markers: an integrated AI-human approach selects CD38 to represent plasma cell-associated transcriptional signatures.

作者信息

Syed Ahamed Kabeer Basirudeen, Subba Bishesh, Rinchai Darawan, Toufiq Mohammed, Khan Taushif, Yurieva Marina, Chaussabel Damien

机构信息

Department of Pathology, Saveetha Medical College and Hospital, Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, India.

The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States.

出版信息

Front Med (Lausanne). 2025 Mar 12;12:1510431. doi: 10.3389/fmed.2025.1510431. eCollection 2025.

Abstract

BACKGROUND

Knowledge-driven prioritization of candidate genes derived from large-scale molecular profiling data for targeted transcriptional profiling assays is challenging due to the vast amount of biomedical literature that needs to be harnessed. We present a workflow leveraging Large Language Models (LLMs) to prioritize candidate genes within module M12.15, a plasma cell-associated module from the BloodGen3 repertoire, by integrating knowledge-driven prioritization with data-driven analysis of transcriptome profiles.

METHODS

The workflow involves a two-step process: (1) high-throughput screening using LLMs to score and rank the 17 genes of module M12.15 based on six predefined criteria, and (2) prioritization employing high-resolution scoring and fact-checking, with human experts validating and refining AI-generated scores.

RESULTS

The first step identified five candidate genes (CD38, TNFRSF17, IGJ, TOP2A, and TYMS). Following human-augmented LLM scoring and fact checking, as part of the second step, CD38 and TNFRSF17 emerged as the top candidates. Next, transcriptome profiling data from three datasets was incorporated in the workflow to assess expression levels and correlations with the module average across various conditions and cell types. It is on this basis that CD38 was prioritized as the top candidate, with TNFRSF17 and IGJ identified as promising alternatives.

CONCLUSION

This study introduces a systematic framework that integrates LLMs with human expertise for gene prioritization. Our analysis identified CD38, TNFRSF17, and IGJ as the top candidates within the plasma cell-associated module M12.15 from the BloodGen3 repertoire, with their relative rankings varying systematically based on specific evaluation criteria, from plasma cell biology to therapeutic relevance. This criterion-dependent ranking demonstrates the ability of the framework to perform nuanced, multi-faceted evaluations. By combining knowledge-driven analysis with data-driven metrics, our approach provides a balanced and comprehensive method for biomarker selection. The methodology established here offers a reproducible and scalable approach that can be applied across diverse biological contexts and extended to analyze large module repertoires.

摘要

背景

由于需要利用大量生物医学文献,从大规模分子谱数据中筛选出用于靶向转录谱分析的候选基因并按优先级排序具有挑战性。我们提出了一种工作流程,通过将知识驱动的优先级排序与转录组谱的数据驱动分析相结合,利用大语言模型(LLMs)对模块M12.15(来自BloodGen3库的浆细胞相关模块)中的候选基因进行优先级排序。

方法

该工作流程包括两个步骤:(1)使用大语言模型进行高通量筛选,根据六个预定义标准对模块M12.15的17个基因进行评分和排名;(2)采用高分辨率评分和事实核查进行优先级排序,由人类专家对人工智能生成的分数进行验证和完善。

结果

第一步确定了五个候选基因(CD38、TNFRSF17、IGJ、TOP2A和TYMS)。作为第二步的一部分,经过人工增强的大语言模型评分和事实核查后,CD38和TNFRSF17成为顶级候选基因。接下来,将来自三个数据集的转录组谱数据纳入工作流程,以评估不同条件和细胞类型下的表达水平以及与模块平均值的相关性。在此基础上,CD38被优先列为顶级候选基因,TNFRSF17和IGJ被确定为有潜力的替代基因。

结论

本研究引入了一个将大语言模型与人类专业知识相结合的系统框架用于基因优先级排序。我们的分析确定CD38、TNFRSF17和IGJ是BloodGen3库中浆细胞相关模块M12.15内的顶级候选基因,根据从浆细胞生物学到治疗相关性的特定评估标准,它们的相对排名会系统地变化。这种依赖标准的排名展示了该框架进行细致入微、多方面评估的能力。通过将知识驱动的分析与数据驱动的指标相结合,我们的方法为生物标志物选择提供了一种平衡且全面的方法。这里建立的方法提供了一种可重复且可扩展的方法,可应用于不同的生物学背景,并扩展到分析大型模块库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ca/11936944/b35f40219b88/fmed-12-1510431-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验