Suppr超能文献

RaggedExperiment:Bioconductor 中基因组范围和矩阵之间缺失的环节。

RaggedExperiment: the missing link between genomic ranges and matrices in Bioconductor.

机构信息

Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, NY 10027, United States.

Institute for Implementation Science and Population Health, City University of New York, New York, NY 10027, United States.

出版信息

Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad330.

Abstract

SUMMARY

The RaggedExperiment R / Bioconductor package provides lossless representation of disparate genomic ranges across multiple specimens or cells, in conjunction with efficient and flexible calculations of rectangular-shaped summaries for downstream analysis. Applications include statistical analysis of somatic mutations, copy number, methylation, and open chromatin data. RaggedExperiment is compatible with multimodal data analysis as a component of MultiAssayExperiment data objects, and simplifies data representation and transformation for software developers and analysts.

MOTIVATION AND RESULTS

Measurement of copy number, mutation, single nucleotide polymorphism, and other genomic attributes that may be stored as VCF files produce "ragged" genomic ranges data: i.e. across different genomic coordinates in each sample. Ragged data are not rectangular or matrix-like, presenting informatics challenges for downstream statistical analyses. We present the RaggedExperiment R/Bioconductor data structure for lossless representation of ragged genomic data, with associated reshaping tools for flexible and efficient calculation of tabular representations to support a wide range of downstream statistical analyses. We demonstrate its applicability to copy number and somatic mutation data across 33 TCGA cancer datasets.

摘要

摘要

RaggedExperiment R / Bioconductor 包提供了跨多个样本或细胞的不同基因组范围的无损表示,以及用于下游分析的高效灵活的矩形摘要计算。应用包括体细胞突变、拷贝数、甲基化和开放染色质数据的统计分析。RaggedExperiment 作为 MultiAssayExperiment 数据对象的一个组件,与多模式数据分析兼容,并简化了软件开发人员和分析人员的数据表示和转换。

动机和结果

测量拷贝数、突变、单核苷酸多态性和其他可能存储为 VCF 文件的基因组属性会产生“参差不齐”的基因组范围数据:即在每个样本中的不同基因组坐标上。参差不齐的数据不是矩形或矩阵状的,为下游统计分析带来了信息学挑战。我们提出了 RaggedExperiment R / Bioconductor 数据结构,用于无损表示参差不齐的基因组数据,并提供了相关的重塑工具,用于灵活高效地计算表格表示,以支持广泛的下游统计分析。我们证明了它在 33 个 TCGA 癌症数据集的拷贝数和体细胞突变数据中的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7e8/10272705/84e27a982c92/btad330f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验