Hegde Megha, Nebel Jean-Christophe, Rahman Farzana
School of Computer Science and Mathematics, Kingston University, London, UK.
Bioinform Biol Insights. 2025 Sep 2;19:11779322251358314. doi: 10.1177/11779322251358314. eCollection 2025.
Interpreting the effects of variants within the human genome and proteome is essential for analysing disease risk, predicting medication response, and developing personalised health interventions. Due to the intrinsic similarities between the structure of natural languages and genetic sequences, natural language processing techniques have demonstrated great applicability in computational variant effect prediction. In particular, the advent of the Transformer has led to significant advancements in the field. However, transformer-based models are not without their limitations, and a number of extensions and alternatives have been developed to improve results and enhance computational efficiency. This systematic review investigates over 50 different language modelling approaches to computational variant effect prediction over the past decade, analysing the main architectures, and identifying key trends and future directions. Benchmarking of the reviewed models remains unachievable at present, primarily due to the lack of shared evaluation frameworks and data sets.
解读人类基因组和蛋白质组中变异的影响对于分析疾病风险、预测药物反应以及制定个性化健康干预措施至关重要。由于自然语言结构与基因序列之间存在内在相似性,自然语言处理技术在计算变异效应预测中已显示出巨大的适用性。特别是,Transformer的出现推动了该领域的重大进展。然而,基于Transformer的模型并非没有局限性,人们已经开发了许多扩展和替代方案来改善结果并提高计算效率。本系统综述调查了过去十年中50多种不同的用于计算变异效应预测的语言建模方法,分析了主要架构,并确定了关键趋势和未来方向。目前,由于缺乏共享的评估框架和数据集,对所审查模型进行基准测试仍然无法实现。