Department of Industrial and Management Systems Engineering, Kyung Hee University, Yongin, Gyeonggi, Korea.
Quantitative Biomedical Research Center, Peter O' Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA.
BMC Bioinformatics. 2022 Nov 8;23(1):469. doi: 10.1186/s12859-022-05012-2.
Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.
由于癌症早期检测在生物医学领域具有至关重要的意义,因此人们对此进行了广泛的研究。在用于回答这个生物学问题的不同类型的数据中,基于 T 细胞受体(TCRs)的研究由于宿主免疫系统在肿瘤生物学中的作用日益受到重视,因此成为了近期的研究热点。然而,患者与多个 TCR 序列之间的一对一对应关系使得研究人员难以简单地采用经典的统计/机器学习方法。最近有人尝试在多实例学习(MIL)的背景下对这种类型的数据进行建模。尽管 MIL 在使用 TCR 序列进行癌症检测方面的新应用以及在几种肿瘤类型中表现出足够的性能,但仍有改进的空间,尤其是对于某些癌症类型。此外,对于这种应用,可解释的神经网络模型尚未得到充分研究。在本文中,我们提出了基于稀疏注意力的多实例神经网络(MINN-SA),以提高癌症检测的性能和可解释性。稀疏注意力结构在每个袋中丢弃无信息实例,通过与跳过连接相结合,实现了可解释性和更好的预测性能。我们的实验表明,与现有的 MIL 方法相比,MINN-SA 在 10 种不同类型的癌症的平均 ROC 曲线下面积得分最高。此外,我们从估计的注意力中观察到,MINN-SA 可以识别同一 T 细胞库中针对肿瘤抗原的 TCR。