Touloumis Anestis, Marioni John C, Tavaré Simon
CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK Computing, Engineering and Mathematics, University of Brighton, Brighton BN2 4GJ, UK.
CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK EMBL-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
Bioinformatics. 2016 Jul 15;32(14):2193-5. doi: 10.1093/bioinformatics/btw224. Epub 2016 Jun 7.
By collecting multiple samples per subject, researchers can characterize intra-subject variation using physiologically relevant measurements such as gene expression profiling. This can yield important insights into fundamental biological questions ranging from cell type identity to tumour development. For each subject, the data measurements can be written as a matrix with the different subsamples (e.g. multiple tissues) indexing the columns and the genes indexing the rows. In this context, neither the genes nor the tissues are expected to be independent and straightforward application of traditional statistical methods that ignore this two-way dependence might lead to erroneous conclusions. Herein, we present a suite of tools embedded within the R/Bioconductor package HDTD for robustly estimating and performing hypothesis tests about the mean relationship and the covariance structure within the rows and columns. We illustrate the utility of HDTD by applying it to analyze data generated by the Genotype-Tissue Expression consortium.
The R package HDTD is part of Bioconductor. The source code and a comprehensive user's guide are available at http://bioconductor.org/packages/release/bioc/html/HDTD.html
Supplementary data are available at Bioinformatics online.
通过为每个受试者收集多个样本,研究人员可以使用诸如基因表达谱等生理相关测量来表征受试者内部的变异。这可以为从细胞类型识别到肿瘤发展等一系列基本生物学问题提供重要见解。对于每个受试者,数据测量可以写成一个矩阵,其中不同的子样本(例如多个组织)作为列的索引,基因作为行的索引。在这种情况下,基因和组织都不被期望是独立的,直接应用忽略这种双向依赖性的传统统计方法可能会导致错误的结论。在此,我们展示了一组嵌入在R/Bioconductor包HDTD中的工具,用于稳健地估计和执行关于行和列内均值关系和协方差结构的假设检验。我们通过将HDTD应用于分析基因型-组织表达联盟生成的数据来说明其效用。
R包HDTD是Bioconductor的一部分。源代码和全面的用户指南可在http://bioconductor.org/packages/release/bioc/html/HDTD.html获得。
补充数据可在《生物信息学》在线获取。