Zhu Yicheng, Neeman Teresa, Yap Von Bing, Huttley Gavin A
Research School of Biology, The Australian National University, Canberra, Australian Capital Territory 2601, Australia
Statistical Consulting Unit, The Australian National University, Canberra, Australian Capital Territory 2601, Australia.
Genetics. 2017 Feb;205(2):843-856. doi: 10.1534/genetics.116.195677. Epub 2016 Dec 14.
Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A[Formula: see text]G mutations. We show that major effects of neighbors on germline mutation lie within [Formula: see text] of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T[Formula: see text]C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif.
不同类型的点突变、基因组位置、细胞和生物物种之间的突变过程存在差异。对于某些点突变,已知特定的相邻碱基具有机制上的影响。除了这些情况,许多问题仍未得到解决,包括:影响点突变的序列基序是什么?基序有多大?它们是链对称的吗?以及,它们在样本之间是否存在差异?我们提出了新的对数线性模型,允许对这些问题进行明确的研究,并通过序列标识风格的可视化来识别特定的基序。我们通过分析人类种系和恶性黑色素瘤中的突变过程来展示这些方法的性能。我们重现了已知的CpG效应,并识别出了新的基序,包括一个与A→G突变相关的高度显著的基序。我们表明,相邻碱基对种系突变的主要影响位于突变碱基的±10bp范围内。还提出了用于对比整个突变谱(不同点突变的分布)的模型。我们表明,常染色体和X染色体之间的突变谱存在显著差异,其中T→C转换的差异占主导。对恶性黑色素瘤的分析证实了该癌症已报道的特征,包括具有统计学意义的链不对称性以及明显不同的相邻影响。我们提出的方法作为一个Python库https://bitbucket.org/pycogent3/mutationmotif免费提供。