Aberasturi Dillon T, Piegorsch Walter W, Bedrick Edward J, Lussier Yves A
Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, USA.
Bio5 Institute, University of Arizona, Tucson, AZ, USA.
Stat (Int Stat Inst). 2023 Jan-Dec;12(1). doi: 10.1002/sta4.518. Epub 2022 Oct 24.
We describe a collaborative project involving faculty and students in a university bioinformatics/biostatistics center. The project focuses on identification of differentially expressed gene sets ("pathways") in subjects expressing a disease state, medical intervention, or other distinguishable condition. The key feature of the endeavor is the data structure presented to the team: a single cohort of subjects with two samples taken from each subject - one for each of two differing conditions without replication. This particular structure leads to essentially a cohort of contingency tables, where each table compares the differential gene state with the pathway condition. Recognizing that correlations both within and between pathway responses can disrupt standard table analytics, we develop methods for analyzing this data structure in the presence of complicated intra-table correlations. These provide some convenient approaches for this problem, using design effect adjustments from sample survey theory and manipulations of the summary table counts. Monte Carlo simulations show that the methods operate extremely well, validating their use in practice. In the end, the collaborative connections among the team members led to solutions no one of us would have envisioned separately.
我们描述了一个涉及大学的生物信息学/生物统计学中心的教师和学生的合作项目。该项目专注于识别表达疾病状态、医学干预或其他可区分状况的受试者中差异表达的基因集(“通路”)。这项工作的关键特征是呈现给团队的数据结构:一组受试者,每个受试者采集两个样本——分别用于两种不同状况,且无重复。这种特殊结构本质上导致了一组列联表,其中每个表将差异基因状态与通路状况进行比较。认识到通路反应内部和之间的相关性会干扰标准的表格分析,我们开发了在存在复杂的表内相关性的情况下分析这种数据结构的方法。这些方法使用样本调查理论中的设计效应调整和汇总表计数的操作,为这个问题提供了一些便捷的方法。蒙特卡罗模拟表明这些方法运行得非常好,验证了它们在实际中的应用。最后,团队成员之间的合作联系带来了我们中没有人能单独设想出来的解决方案。