Using Simulations to Explore Sampling Distributions: An Antidote to Hasty and Extravagant Inferences.
Guillaume A Rousselet
Abstract
Open AccessMost statistical inferences in neuroscience and psychology are based on frequentist statistics, which rely on sampling distributions: the long-run outcomes of multiple experiments, given a certain model. Yet, sampling distributions are poorly understood and rarely explicitly considered when making inferences. In this tutorial and commentary, I demonstrate how to use simulations to illustrate sampling distributions to answer simple practical questions: for instance, if we could run thousands of experiments, what would the outcome look like? What do these simulations tell us about the results from a single experiment? Such simulations can be run a priori, given expected results, or a posteriori, using existing datasets. Both approaches can help make explicit the data generating process and the sources of variability; they also reveal the large uncertainty in our experimental estimation and lead to the sobering realization that, in most situations, we should not make a big deal out of results from a single experiment. Simulations can also help demonstrate how the selection of effect sizes conditional on some arbitrary cutoff (p ≤ 0.05) leads to a literature filled with false positives, a powerful illustration of the damage done in part by researchers' over-confidence in their statistical tools. The tutorial focuses on graphical descriptions and covers examples using correlation analyses, proportion data, and response latency data. All the figures and numerical values in this article can be reproduced using code available at https://github.com/GRousselet/sampdist.