Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina, United States of America.
Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina, United States of America.
PLoS Comput Biol. 2020 Feb 25;16(2):e1007664. doi: 10.1371/journal.pcbi.1007664. eCollection 2020 Feb.
Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.
正确的注释元数据对于可重复和准确的 RNA-seq 分析至关重要。当文件以不正确或缺少注释元数据的形式在公共场合共享或在合作者之间共享时,从原始数据重现生物信息学分析变得困难或不可能。这也使得在适当的基因组背景下定位转录组特征(如转录本或基因)变得更加困难,这对于将表达数据与其他数据集重叠是必要的。我们以 R/Bioconductor 包 tximeta 的形式提供了一个解决方案,该包在导入转录本定量文件时代表用户自动执行许多注释和元数据收集任务。通过存储在定量输出中的哈希校验和来识别正确的参考转录组,并下载和缓存关键的转录本数据库。基于参考序列校验和自动添加注释元数据的计算范例可以极大地促进基因组工作流程,通过帮助减少生物信息学分析中的开销,防止昂贵的生物信息学错误,并促进计算可重复性。tximeta 包可在 https://bioconductor.org/packages/tximeta 获得。