Department of Oncology, Xijing Hospital, Fourth Military Medical University, No. 127 West Changle Road, Xi'an, 710032, China.
Faculty of Hepatopancreatobiliary Surgery, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, China.
BMC Med Genomics. 2021 Dec 20;14(1):298. doi: 10.1186/s12920-021-01144-1.
Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long sequences next to mutation bases, the understanding of how flanking sequences influence mutation signatures is limited.
We constructed a long short-term memory-self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with the SOM, single-base substitutions in The Cancer Genome Atlas database were clustered according to both their mutation site and flanking sequences. The relationship between mutation sequence signatures and clinical features was then analyzed. Finally, we clustered patients into different classes according to the composition of the mutation sequence signatures by the K-means method and then studied the differences in clinical features and survival between classes.
Ten classes of mutant sequence signatures (mutation blots, MBs) were obtained from 2,141,527 single-base substitutions via LSTM-SOM machine learning approach. Different features in mutation bases and flanking sequences were revealed among MBs. MBs reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. The class of an MB in a given gene was associated with survival. Finally, patients were clustered into 7 classes according to the MB composition. Significant differences in survival and clinical features were observed among different patient classes.
We provided a method for analyzing the characteristics of mutant sequences. Result of this study showed that flanking sequences, together with mutation bases, shape the signatures of SBSs. MBs were shown related to clinical features and survival of cancer patients. Composition of MBs is a feasible predictive factor of clinical prognosis. Further study of the mechanism of MBs related to cancer characteristics is suggested.
突变过程会在基因中留下不同的特征。对于单碱基替换,先前的研究表明,突变特征不仅反映在突变碱基上,还反映在相邻碱基上。然而,由于缺乏识别突变碱基附近长序列特征的方法,因此对侧翼序列如何影响突变特征的理解是有限的。
我们构建了一个长短期记忆自组织映射(LSTM-SOM)无监督神经网络。通过 LSTM 提取突变序列特征,并使用 SOM 对相似特征进行聚类,根据突变位点和侧翼序列对 TCGA 数据库中的单碱基替换进行聚类。然后分析突变序列特征与临床特征之间的关系。最后,我们通过 K-均值方法根据突变序列特征的组成将患者聚类为不同的类别,然后研究不同类别之间的临床特征和生存差异。
通过 LSTM-SOM 机器学习方法,从 2141527 个单碱基替换中获得了 10 类突变序列特征(突变点,MB)。在 MB 之间揭示了突变碱基和侧翼序列中的不同特征。MB 反映了癌症的部位和病理特征。MB 与临床特征有关,包括年龄、性别和癌症分期。给定基因中 MB 的类别与生存有关。最后,根据 MB 组成将患者聚类为 7 类。不同患者类别之间的生存和临床特征存在显著差异。
我们提供了一种分析突变序列特征的方法。本研究结果表明,侧翼序列与突变碱基一起形成 SBS 的特征。MB 与癌症患者的临床特征和生存有关。MB 的组成是临床预后的一个可行的预测因素。建议进一步研究与癌症特征相关的 MB 机制。