Zhang Jing, Kinch Lisa, Katsonis Panagiotis, Lichtarge Olivier, Jagota Milind, Song Yun S, Sun Yuanfei, Shen Yang, Kuru Nurdan, Dereli Onur, Adebali Ogun, Alladin Muttaqi Ahmad, Pal Debnath, Capriotti Emidio, Turina Maria Paola, Savojardo Castrense, Martelli Pier Luigi, Babbi Giulia, Casadio Rita, Pucci Fabrizio, Rooman Marianne, Cia Gabriel, Tsishyn Matsvei, Strokach Alexey, Hu Zhiqiang, van Loggerenberg Warren, Roth Frederick P, Radivojac Predrag, Brenner Steven E, Cong Qian, Grishin Nick V
Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
Hum Genet. 2025 Mar;144(2-3):173-189. doi: 10.1007/s00439-024-02680-3. Epub 2024 Aug 7.
This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.
本文展示了对为“HMBS”挑战所提交预测结果的评估,该挑战是2021年举行的第六届基因组解释关键评估的一部分。该挑战要求参与者预测人类HMBS基因错义变体对酵母生长的影响。HMBS酶对真核细胞中血红素的生物合成至关重要,在真核生物中高度保守。尽管应用了各种算法和方法,但预测器的性能相对相似,大多数提交结果的预测与实验得分之间的肯德尔tau相关系数约为0.3。值得注意的是,在这些预测器中观察到的中位数相关性(≥0.34),尤其是来自不同组的顶级预测,大于它们的预测与实际实验结果之间的相关性。大多数预测器在区分有害变体和良性变体方面取得了一定成功,分别由约0.7的受试者工作特征曲线下面积(AUC)证明。与最近两轮CAGI竞赛相比,我们注意到更多的预测器优于仅基于氨基酸频率的基线预测器。然而,预测的总体准确性仍远低于来自实验得分的阳性对照,这表明该领域有必要进行大幅改进。本轮预测最不准确的变体与插入环相关,许多直系同源物中不存在该插入环,这表明预测器仍然严重依赖多序列比对的信息。