Dumas-Mallet Estelle, Button Katherine, Boraud Thomas, Munafo Marcus, Gonon François
CNRS, UMR 5293, Institute of Neurodegenerative diseases, Bordeaux, France.
University of Bordeaux, UMR 5293, Institute of Neurodegenerative diseases, Bordeaux, France.
PLoS One. 2016 Jun 23;11(6):e0158064. doi: 10.1371/journal.pone.0158064. eCollection 2016.
There are growing concerns about effect size inflation and replication validity of association studies, but few observational investigations have explored the extent of these problems.
Using meta-analyses to measure the reliability of initial studies and explore whether this varies across biomedical domains and study types (cognitive/behavioral, brain imaging, genetic and "others").
We analyzed 663 meta-analyses describing associations between markers or risk factors and 12 pathologies within three biomedical domains (psychiatry, neurology and four somatic diseases). We collected the effect size, sample size, publication year and Impact Factor of initial studies, largest studies (i.e., with the largest sample size) and the corresponding meta-analyses. Initial studies were considered as replicated if they were in nominal agreement with meta-analyses and if their effect size inflation was below 100%.
Nominal agreement between initial studies and meta-analyses regarding the presence of a significant effect was not better than chance in psychiatry, whereas it was somewhat better in neurology and somatic diseases. Whereas effect sizes reported by largest studies and meta-analyses were similar, most of those reported by initial studies were inflated. Among the 256 initial studies reporting a significant effect (p<0.05) and paired with significant meta-analyses, 97 effect sizes were inflated by more than 100%. Nominal agreement and effect size inflation varied with the biomedical domain and study type. Indeed, the replication rate of initial studies reporting a significant effect ranged from 6.3% for genetic studies in psychiatry to 86.4% for cognitive/behavioral studies. Comparison between eight subgroups shows that replication rate decreases with sample size and "true" effect size. We observed no evidence of association between replication rate and publication year or Impact Factor.
The differences in reliability between biological psychiatry, neurology and somatic diseases suggest that there is room for improvement, at least in some subdomains.
人们越来越关注关联研究中的效应量膨胀和重复验证有效性问题,但很少有观察性研究探讨这些问题的严重程度。
通过荟萃分析来衡量初始研究的可靠性,并探讨其在不同生物医学领域和研究类型(认知/行为、脑成像、遗传学及“其他”)之间是否存在差异。
我们分析了663项荟萃分析,这些分析描述了三个生物医学领域(精神病学、神经病学和四种躯体疾病)中标志物或风险因素与12种病症之间的关联。我们收集了初始研究、最大规模研究(即样本量最大的研究)以及相应荟萃分析的效应量、样本量、发表年份和影响因子。如果初始研究在名义上与荟萃分析一致且效应量膨胀低于100%,则认为该初始研究得到了重复验证。
在精神病学领域,初始研究与荟萃分析在显著效应存在与否方面的名义一致性并不比随机情况更好,而在神经病学和躯体疾病领域则略好一些。虽然最大规模研究和荟萃分析报告的效应量相似,但大多数初始研究报告的效应量存在膨胀。在报告了显著效应(p<0.05)并与显著的荟萃分析配对的256项初始研究中,97个效应量膨胀超过了100%。名义一致性和效应量膨胀随生物医学领域和研究类型而变化。事实上,报告了显著效应的初始研究的重复率从精神病学遗传学研究的6.3%到认知/行为研究的86.4%不等。八个亚组之间的比较表明,重复率随样本量和“真实”效应量的增加而降低。我们没有观察到重复率与发表年份或影响因子之间存在关联的证据。
生物精神病学、神经病学和躯体疾病之间可靠性的差异表明,至少在某些子领域仍有改进空间。