Monte Carlo simulations and NHST In many social sciences p-values and null hypothesis significance testing (NHST) are often used to draw far-reaching scientific conclusions — despite the fact that they are as a rule poorly understood and that there exist altenatives that are easier to understand and more informative. Not the least using confidence intervals (CIs) and effect sizes are to be preferred to the Neyman-Pearson-Fisher mishmash approach that is so often practised by applied researchers. Running a Monte Carlo simulation with 100 replications of a fictitious sample having N = 20, confidence intervals of 95%, a normally distributed population with a mean = 10 and a standard deviation of 20, taking two-tailed p-values on a zero null hypothesis, we
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Lars Pålsson Syll writes Interpreting confidence intervals
Monte Carlo simulations and NHST
In many social sciences p-values and null hypothesis significance testing (NHST) are often used to draw far-reaching scientific conclusions — despite the fact that they are as a rule poorly understood and that there exist altenatives that are easier to understand and more informative.
Not the least using confidence intervals (CIs) and effect sizes are to be preferred to the Neyman-Pearson-Fisher mishmash approach that is so often practised by applied researchers.
Running a Monte Carlo simulation with 100 replications of a fictitious sample having N = 20, confidence intervals of 95%, a normally distributed population with a mean = 10 and a standard deviation of 20, taking two-tailed p-values on a zero null hypothesis, we get varying CIs (since they are based on varying sample standard deviations), but with a minimum of 3.2 and a maximum of 26.1 we still get a clear picture of what would happen in an infinite limit sequence. On the other hand p-values (even though from a purely mathematical statistical sense more or less equivalent to CIs) vary strongly from sample to sample, and jumping around between a minimum of 0.007 and a maximum of 0.999 doesn’t give you a clue of what will happen in an infinite limit sequence! So, I can’t but agree with Geoff Cummings:
The problems are so severe we need to shift as much as possible from NHST … The first shift should be to estimation: report and interpret effect sizes and CIs … I suggest p should be given only a marginal role, its problem explained, and it should be interpreted primarily as an indicator of where the 95% CI falls in relation to a null hypothesised value.
In case you want to do your own Monte Carlo simulation, here’s one yours truly made using my favourite econometrics program Gretl:
nulldata 20
loop 100 –progressive
series y = normal(10,15)
scalar zs = (10-mean(y))/sd(y)
scalar df = $nobs-1
scalar ybar=mean(y)
scalar ysd= sd(y)
scalar ybarsd=ysd/sqrt($nobs)
scalar tstat = (ybar-10)/ybarsd
pvalue t df tstat
scalar lowb = mean(y) – critical(t,df,0.025)*ybarsd
scalar uppb = mean(y) + critical(t,df,0.025)*ybarsd
scalar pval = pvalue(t,df,tstat)
store E:pvalcoeff.gdt lowb uppb pval
endloop