USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, United States.
Department of Computer Science, Iowa State University, Ames, IA 50011, United States.
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae073.
Understanding the effects of genetic variants is crucial for accurately predicting traits and functional outcomes. Recent approaches have utilized artificial intelligence and protein language models to score all possible missense variant effects at the proteome level for a single genome, but a reliable tool is needed to explore these effects at the pan-genome level. To address this gap, we introduce a new tool called PanEffect. We implemented PanEffect at MaizeGDB to enable a comprehensive examination of the potential effects of coding variants across 50 maize genomes. The tool allows users to visualize over 550 million possible amino acid substitutions in the B73 maize reference genome and to observe the effects of the 2.3 million natural variations in the maize pan-genome. Each variant effect score, calculated from the Evolutionary Scale Modeling (ESM) protein language model, shows the log-likelihood ratio difference between B73 and all variants in the pan-genome. These scores are shown using heatmaps spanning benign outcomes to potential functional consequences. In addition, PanEffect displays secondary structures and functional domains along with the variant effects, offering additional functional and structural context. Using PanEffect, researchers now have a platform to explore protein variants and identify genetic targets for crop enhancement.
The PanEffect code is freely available on GitHub (https://github.com/Maize-Genetics-and-Genomics-Database/PanEffect). A maize implementation of PanEffect and underlying datasets are available at MaizeGDB (https://www.maizegdb.org/effect/maize/).
理解遗传变异的影响对于准确预测性状和功能结果至关重要。最近的方法利用人工智能和蛋白质语言模型来对单个基因组的蛋白质组水平上的所有可能错义变异效应进行评分,但需要一个可靠的工具来探索泛基因组水平上的这些效应。为了解决这一差距,我们引入了一个名为 PanEffect 的新工具。我们在 MaizeGDB 中实现了 PanEffect,以能够全面检查 50 个玉米基因组中编码变异的潜在影响。该工具允许用户可视化 B73 玉米参考基因组中超过 5.5 亿种可能的氨基酸替换,并观察玉米泛基因组中 230 万个自然变异的影响。每个变体效应评分,由 Evolutionary Scale Modeling (ESM) 蛋白质语言模型计算得出,显示了 B73 和泛基因组中所有变体之间的对数似然比差异。这些评分使用热图显示,涵盖良性结果到潜在功能后果。此外,PanEffect 还显示了变体效应的二级结构和功能域,提供了额外的功能和结构背景。使用 PanEffect,研究人员现在有了一个探索蛋白质变体和识别作物增强遗传靶标的平台。
PanEffect 代码可在 GitHub(https://github.com/Maize-Genetics-and-Genomics-Database/PanEffect)上免费获得。PanEffect 的玉米实现和基础数据集可在 MaizeGDB(https://www.maizegdb.org/effect/maize/)上获得。