Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012.
Proc Natl Acad Sci U S A. 2023 Oct 10;120(41):e2221165120. doi: 10.1073/pnas.2221165120. Epub 2023 Oct 5.
Machine learning methods, particularly neural networks trained on large datasets, are transforming how scientists approach scientific discovery and experimental design. However, current state-of-the-art neural networks are limited by their uninterpretability: Despite their excellent accuracy, they cannot describe how they arrived at their predictions. Here, using an "interpretable-by-design" approach, we present a neural network model that provides insights into RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. Importantly, the model revealed uncharacterized components of the splicing logic, which we experimentally validated. This study highlights how interpretable machine learning can advance scientific discovery.
机器学习方法,特别是在大型数据集上训练的神经网络,正在改变科学家们进行科学发现和实验设计的方式。然而,当前最先进的神经网络受到其不可解释性的限制:尽管它们具有出色的准确性,但它们无法描述它们是如何得出预测结果的。在这里,我们采用一种“设计可解释性”的方法,提出了一种神经网络模型,该模型提供了对 RNA 剪接的深入了解,RNA 剪接是将基因组信息转化为功能性生化产物的基本过程。尽管我们设计模型强调可解释性,但它的预测准确性与最先进的模型相当。为了展示模型的可解释性,我们引入了一种可视化方法,对于任何给定的外显子,我们都可以追踪并量化从输入序列到输出剪接预测的整个决策过程。重要的是,该模型揭示了剪接逻辑中未被表征的组成部分,我们通过实验进行了验证。这项研究强调了可解释机器学习如何促进科学发现。