Suppr超能文献

模拟影响TnSeq数据集中Himar1转座子插入频率的位点特异性核苷酸偏差

Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets.

作者信息

Choudhery Sanjeevani, Brown A Jacob, Akusobi Chidiebere, Rubin Eric J, Sassetti Christopher M, Ioerger Thomas R

机构信息

Department of Computer Science and Engineering, Texas A&M University, College Station, Texas, USA.

Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, Massachusetts, USA.

出版信息

mSystems. 2021 Oct 26;6(5):e0087621. doi: 10.1128/mSystems.00876-21. Epub 2021 Oct 19.

Abstract

TnSeq is a widely used methodology for determining gene essentiality, conditional fitness, and genetic interactions in bacteria. The Himar1 transposon is restricted to insertions at TA dinucleotides, but otherwise, few site-specific biases have been identified. As a result, most analytical approaches assume that insertions are expected to be randomly distributed among TA sites in nonessential regions. However, through analysis of Himar1 transposon libraries in Mycobacterium tuberculosis, we demonstrate that there are site-specific biases that affect the frequency of insertion of the Himar1 transposon at different TA sites. We use machine learning and statistical models to characterize patterns in the nucleotides surrounding TA sites that correlate with high or low insertion counts. We then develop a quantitative model based on these patterns that can be used to predict the expected counts at each TA site based on nucleotide context, which can explain up to half of the variance in insertion counts. We show that these insertion preferences exist in Himar1 TnSeq data sets from other mycobacterial and nonmycobacterial species. We present an improved method for identification of essential genes, called TTN-Fitness, that can better distinguish true biological fitness effects by comparing observed counts to expected counts based on our site-specific model of insertion preferences. Compared to previous essentiality methods, TTN-Fitness can make finer distinctions among genes whose disruption causes a fitness defect (or advantage), separating them out from the large pool of nonessentials, and is able to classify many smaller genes (with few TA sites) that were previously characterized as uncertain. When using the Himar1 transposon to create transposon insertion mutant libraries, it is known that the transposon is restricted to insertions at TA dinucleotide sites throughout the genome, and the absence of insertions is used to infer which genes are essential (or conditionally essential) in a bacterial organism. It is widely assumed that insertions in nonessential regions are otherwise random, and this assumption is used as the basis of several methods for statistical analysis of TnSeq data. In this paper, we show that the nucleotide sequence surrounding TA sites influences the magnitude of insertions, and these Himar1 insertion preferences (sequence biases) can partially explain why some sites have higher counts than others. We use this predictive model to make improved estimates of the fitness effects of genes, which help make finer distinctions of the phenotype and biological consequences of disruption of nonessential genes.

摘要

TnSeq是一种广泛用于确定细菌中基因必需性、条件适应性和遗传相互作用的方法。Himar1转座子仅限于在TA二核苷酸处插入,但除此之外,几乎没有发现位点特异性偏差。因此,大多数分析方法假定在非必需区域的TA位点之间插入是随机分布的。然而,通过对结核分枝杆菌中Himar1转座子文库的分析,我们证明存在影响Himar1转座子在不同TA位点插入频率的位点特异性偏差。我们使用机器学习和统计模型来表征与高或低插入计数相关的TA位点周围核苷酸的模式。然后,我们基于这些模式开发了一个定量模型,该模型可用于根据核苷酸上下文预测每个TA位点的预期计数,这可以解释高达一半的插入计数方差。我们表明,这些插入偏好存在于来自其他分枝杆菌和非分枝杆菌物种的Himar1 TnSeq数据集中。我们提出了一种改进的鉴定必需基因的方法,称为TTN-Fitness,它可以通过将观察到的计数与基于我们的位点特异性插入偏好模型的预期计数进行比较,更好地区分真正的生物学适应性效应。与以前的必需性方法相比,TTN-Fitness可以在其破坏导致适应性缺陷(或优势)的基因之间做出更精细的区分,将它们与大量非必需基因区分开来,并且能够对许多以前被归类为不确定的较小基因(具有较少TA位点)进行分类。当使用Himar1转座子创建转座子插入突变文库时,已知转座子仅限于在整个基因组的TA二核苷酸位点插入,并且插入的缺失用于推断细菌生物体中哪些基因是必需的(或条件必需的)。人们普遍认为在非必需区域的插入在其他方面是随机的,并且这个假设被用作几种TnSeq数据统计分析方法的基础。在本文中,我们表明TA位点周围的核苷酸序列会影响插入的幅度,并且这些Himar1插入偏好(序列偏差)可以部分解释为什么某些位点的计数高于其他位点。我们使用这个预测模型来改进对基因适应性效应的估计,这有助于对非必需基因破坏的表型和生物学后果做出更精细的区分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/373a/8525568/eb6b290381e5/msystems.00876-21-f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验