Chen Xiaoya, Xu Huinan, Yu Shengjie, Hu Wan, Zhang Zhongjin, Wang Xue, Yuan Yue, Wang Mingyue, Chen Liang, Lin Xiumei, Hu Yinlei, Cai Pengfei
BGI Research, Hangzhou 310030, China.
College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
Biology (Basel). 2025 Jun 4;14(6):651. doi: 10.3390/biology14060651.
Gene expression regulation underpins cellular function and disease progression, yet its complexity and the limitations of conventional detection methods hinder clinical translation. In this review, we define "predict" as the AI-driven inference of gene expression levels and regulatory mechanisms from non-invasive multimodal data (e.g., histopathology images, genomic sequences, and electronic health records) instead of direct molecular assays. We systematically examine and analyze the current approaches for predicting gene expression and diagnosing diseases, highlighting their respective advantages and limitations. Machine learning algorithms and deep learning models excel in extracting meaningful features from diverse biomedical modalities, enabling tools like PathChat and Prov-GigaPath to improve cancer subtyping, therapy response prediction, and biomarker discovery. Despite significant progress, persistent challenges-such as data heterogeneity, noise, and ethical issues including privacy and algorithmic bias-still limit broad clinical adoption. Emerging solutions like cross-modal pretraining frameworks, federated learning, and fairness-aware model design aim to overcome these barriers. Case studies in precision oncology illustrate AI's ability to decode tumor ecosystems and predict treatment outcomes. By harmonizing multimodal data and advancing ethical AI practices, this field holds immense potential to propel personalized medicine forward, although further innovation is needed to address the issues of scalability, interpretability, and equitable deployment.
基因表达调控是细胞功能和疾病进展的基础,但其复杂性以及传统检测方法的局限性阻碍了临床转化。在本综述中,我们将“预测”定义为通过人工智能从非侵入性多模态数据(例如组织病理学图像、基因组序列和电子健康记录)中推断基因表达水平和调控机制,而非直接进行分子检测。我们系统地审视和分析了当前预测基因表达和诊断疾病的方法,突出了它们各自的优点和局限性。机器学习算法和深度学习模型擅长从各种生物医学模态中提取有意义的特征,使PathChat和Prov-GigaPath等工具能够改进癌症亚型分类、治疗反应预测和生物标志物发现。尽管取得了重大进展,但诸如数据异质性、噪声以及包括隐私和算法偏差在内的伦理问题等持续存在的挑战,仍然限制了其在临床上的广泛应用。跨模态预训练框架、联邦学习和公平感知模型设计等新兴解决方案旨在克服这些障碍。精准肿瘤学中的案例研究说明了人工智能解码肿瘤生态系统和预测治疗结果的能力。通过整合多模态数据并推进符合伦理的人工智能实践,该领域具有推动个性化医疗向前发展的巨大潜力,尽管还需要进一步创新来解决可扩展性、可解释性和公平部署等问题。