Jain Shantanu, Trinidad Marena, Nguyen Thanh Binh, Jones Kaiya, Neto Santiago Diaz, Ge Fang, Glagovsky Ailin, Jones Cameron, Moran Giankaleb, Wang Boqi, Rahimi Kobra, Çalıcı Sümeyra Zeynep, Cedillo Luis R, Berardelli Silvia, Özden Buse, Chen Ken, Katsonis Panagiotis, Williams Amanda, Lichtarge Olivier, Rana Sadhna, Pradhan Swatantra, Srinivasan Rajgopal, Sajeed Rakshanda, Joshi Dinesh, Faraggi Eshel, Jernigan Robert, Kloczkowski Andrzej, Xu Jierui, Song Zigang, Özkan Selen, Padilla Natàlia, de la Cruz Xavier, Acuna-Hidalgo Rocio, Grafmüller Andrea, Barrón Laura T Jiménez, Manfredi Matteo, Savojardo Castrense, Babbi Giulia, Martelli Pier Luigi, Casadio Rita, Sun Yuanfei, Zhu Shaowen, Shen Yang, Pucci Fabrizio, Rooman Marianne, Cia Gabriel, Raimondi Daniele, Hermans Pauline, Kwee Sofia, Chen Ella, Astore Courtney, Kamandula Akash, Pejaver Vikas, Ramola Rashika, Velyunskiy Michelle, Zeiberg Daniel, Mishra Reet, Sterling Teague, Goldstein Jennifer L, Lugo-Martinez Jose, Kazi Sufyan, Li Sindy, Long Kinsey, Brenner Steven E, Bakolitsa Constantina, Radivojac Predrag, Suhr Dean, Suhr Teryn, Clark Wyatt T
The Institute for Experiential AI, Northeastern University, Boston, MA, USA.
Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
Hum Genet. 2025 Mar;144(2-3):295-308. doi: 10.1007/s00439-025-02731-3. Epub 2025 Mar 8.
Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the Arylsulfatase A (ARSA) gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models. Notably, a model developed by participants of a genetics and coding bootcamp, trained with standard machine-learning tools in Python, demonstrated superior performance among submissions. Furthermore, the study observed that state-of-the-art deep learning methods provided small but statistically significant improvement in predictive performance compared to less elaborate techniques. These findings underscore the utility of variant effect prediction, and the potential for models trained with modest resources to accurately classify VUS in genetic and clinical research.
变异效应预测的持续进展对于证明机器学习方法准确确定意义未明变异(VUS)临床影响的能力至关重要。为实现这一目标,设计了芳基硫酸酯酶A(ARSA)基因的基因组解释关键评估(CAGI)挑战,通过利用219个经实验测定的错义VUS来评估社区提交的变异功能效应预测的性能。该挑战涉及15个团队,并评估了已建立和最近发布模型的额外预测。值得注意的是,一个由遗传学和编码训练营参与者开发的模型,使用Python中的标准机器学习工具进行训练,在提交的结果中表现出卓越的性能。此外,该研究观察到,与不太精细的技术相比,最先进的深度学习方法在预测性能上提供了虽小但具有统计学意义的改进。这些发现强调了变异效应预测的实用性,以及使用适度资源训练的模型在遗传和临床研究中准确分类VUS的潜力。