Basson Abigail R, Cominelli Fabio, Rodriguez-Palacios Alexander
Division of Gastroenterology and Liver Diseases, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA.
Digestive Health Research Institute, University Hospitals Cleveland Medical Center, Cleveland, OH 44106, USA.
J Pers Med. 2021 Mar 23;11(3):234. doi: 10.3390/jpm11030234.
Poor study reproducibility is a concern in translational research. As a solution, it is recommended to increase sample size (N), i.e., add more subjects to experiments. The goal of this study was to examine/visualize data multimodality (data with >1 data peak/mode) as cause of study irreproducibility. To emulate the repetition of studies and random sampling of study subjects, we first used various simulation methods of random number generation based on preclinical published disease outcome data from human gut microbiota-transplantation rodent studies (e.g., intestinal inflammation and univariate/continuous). We first used unimodal distributions (one-mode, Gaussian, and binomial) to generate random numbers. We showed that increasing N does not reproducibly identify statistical differences when group comparisons are repeatedly simulated. We then used multimodal distributions (>1-modes and Markov chain Monte Carlo methods of random sampling) to simulate similar multimodal datasets A and B (-test- = 0.95; N = 100,000), and confirmed that increasing N does not improve the 'reproducibility of statistical results or direction of the effects'. Data visualization with violin plots of categorical random data simulations with five-integer categories/five-groups illustrated how multimodality leads to irreproducibility. Re-analysis of data from a human clinical trial that used maltodextrin as dietary placebo illustrated multimodal responses between human groups, and after placebo consumption. In conclusion, increasing N does not necessarily ensure reproducible statistical findings across repeated simulations due to randomness and multimodality. Herein, we clarify how to quantify, visualize and address disease data multimodality in research. Data visualization could facilitate study designs focused on disease subtypes/modes to help understand person-person differences and personalized medicine.
研究可重复性差是转化研究中令人担忧的问题。作为一种解决方案,建议增加样本量(N),即在实验中增加更多受试者。本研究的目的是检验/可视化数据多模态性(具有>1个数据峰值/众数的数据)作为研究不可重复性的原因。为了模拟研究的重复和研究对象的随机抽样,我们首先基于人类肠道微生物群移植啮齿动物研究(如肠道炎症和单变量/连续变量)的临床前已发表疾病结局数据,使用了各种随机数生成的模拟方法。我们首先使用单峰分布(单众数、高斯分布和二项分布)来生成随机数。我们发现,当重复模拟组间比较时,增加N并不能可靠地识别统计差异。然后,我们使用多峰分布(>1个众数和马尔可夫链蒙特卡罗随机抽样方法)来模拟类似的多峰数据集A和B(检验=0.95;N=100,000),并证实增加N并不能提高“统计结果的可重复性或效应方向”。用具有五个整数类别/五组的分类随机数据模拟的小提琴图进行数据可视化,说明了多模态性如何导致不可重复性。对一项使用麦芽糊精作为饮食安慰剂的人类临床试验数据的重新分析表明,人群之间以及服用安慰剂后存在多峰反应。总之,由于随机性和多模态性,增加N不一定能确保在重复模拟中获得可重复的统计结果。在此,我们阐明了如何在研究中量化、可视化和处理疾病数据的多模态性。数据可视化可以促进针对疾病亚型/模式的研究设计,以帮助理解个体差异和个性化医学。