Suppr超能文献

TAPDANCE:一种自动化工具,用于从下一代序列数据中识别和注释转座子插入 CIS 以及 CIS 之间的关联。

TAPDANCE: an automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data.

机构信息

Biostatistics and Bioinformatics Masonic Cancer Center, University of Minnesota, Minneapolis, USA.

出版信息

BMC Bioinformatics. 2012 Jun 29;13:154. doi: 10.1186/1471-2105-13-154.

Abstract

BACKGROUND

Next generation sequencing approaches applied to the analyses of transposon insertion junction fragments generated in high throughput forward genetic screens has created the need for clear informatics and statistical approaches to deal with the massive amount of data currently being generated. Previous approaches utilized to 1) map junction fragments within the genome and 2) identify Common Insertion Sites (CISs) within the genome are not practical due to the volume of data generated by current sequencing technologies. Previous approaches applied to this problem also required significant manual annotation.

RESULTS

We describe Transposon Annotation Poisson Distribution Association Network Connectivity Environment (TAPDANCE) software, which automates the identification of CISs within transposon junction fragment insertion data. Starting with barcoded sequence data, the software identifies and trims sequences and maps putative genomic sequence to a reference genome using the bowtie short read mapper. Poisson distribution statistics are then applied to assess and rank genomic regions showing significant enrichment for transposon insertion. Novel methods of counting insertions are used to ensure that the results presented have the expected characteristics of informative CISs. A persistent mySQL database is generated and utilized to keep track of sequences, mappings and common insertion sites. Additionally, associations between phenotypes and CISs are also identified using Fisher's exact test with multiple testing correction. In a case study using previously published data we show that the TAPDANCE software identifies CISs as previously described, prioritizes them based on p-value, allows holistic visualization of the data within genome browser software and identifies relationships present in the structure of the data.

CONCLUSIONS

The TAPDANCE process is fully automated, performs similarly to previous labor intensive approaches, provides consistent results at a wide range of sequence sampling depth, has the capability of handling extremely large datasets, enables meaningful comparison across datasets and enables large scale meta-analyses of junction fragment data. The TAPDANCE software will greatly enhance our ability to analyze these datasets in order to increase our understanding of the genetic basis of cancers.

摘要

背景

应用于高通量正向遗传筛选中产生的转座子插入连接片段的下一代测序方法创造了对当前生成的大量数据进行清晰的信息学和统计方法处理的需求。以前用于 1)在基因组内映射连接片段,以及 2)识别基因组内的常见插入位点(CIS)的方法由于当前测序技术产生的数据量而不实用。以前应用于该问题的方法也需要大量的手动注释。

结果

我们描述了转座子注释泊松分布关联网络连接环境(TAPDANCE)软件,该软件自动识别转座子连接片段插入数据中的 CIS。从带条形码的序列数据开始,该软件使用 bowtie 短读映射器识别和修剪序列,并将假定的基因组序列映射到参考基因组。然后应用泊松分布统计来评估和排名显示转座子插入显著富集的基因组区域。使用新的插入计数方法来确保呈现的结果具有信息性 CIS 的预期特征。生成并利用持久的 MySQL 数据库来跟踪序列、映射和常见插入位点。此外,还使用 Fisher 精确检验和多重检验校正来识别表型和 CIS 之间的关联。在使用先前发表的数据进行的案例研究中,我们表明 TAPDANCE 软件如前所述识别 CIS,根据 p 值对其进行优先级排序,允许在基因组浏览器软件中对数据进行整体可视化,并识别数据结构中存在的关系。

结论

TAPDANCE 过程完全自动化,与以前劳动密集型的方法性能相似,在广泛的序列采样深度下提供一致的结果,具有处理极大数据集的能力,能够进行有意义的数据集之间的比较,并能够进行大规模的连接片段数据元分析。TAPDANCE 软件将极大地增强我们分析这些数据集的能力,以提高我们对癌症遗传基础的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d501/3461456/67cacacd170e/1471-2105-13-154-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验