empirical P-values and bootstrapping

I’m in the process of resurrecting a stats course for potential psych majors in community college.  I finished Grad school in 1999 for clinical Psychology.  Though it was a good program, I find the first year stats sequence uninspiring.  Something was missing in the teaching, or maybe the professor was grinding some conceptual axe with those not in the room.

I’d already taken part of a first year sequence, and that professor really brought home the idea that there was still a lot of debate and interpretation on how best to perform a statistical test.  Perhaps this was my first memory of political disagreements.  So when I started grad school officially, I was turned off by the certainty of everything that the instructor projected.

So, here I am now trying to think about exposing students to potentially their first course in stats.  What does this have to do with p-values and bootstrapping?

Well, psych students are often taught a flow chart style of statistical test choosing, and P-values are just the thing that tells you whether you have a significant effect.  We tend to skip over the intuitive understanding of what is meant by ‘finding an effect equally or more extreme than the sample statistic assuming the null hypothesis is correct, or true.”  Concretely I think students see a goal post of a given alpha level, say .05, and if you pass the goal post, yay!, you found an effect.

But that misses some intuitive appreciation for p-values.

Recently I’ve been lurking in some coursera courses, in particular from the John Hopkins Group of Peng, Leek, and Caffo, and they’ve inspired me to dig a bit more into calculus and probability theory.  Along the way, and maybe some return trips to their teachings, I came across some introductions to bootstrapping and how one could calculate an empirical p-value by averaging the number of bootstrapped estimates (of say a median) greater than the observed statistic (again, the sample median).  In other words, the average of a Boolean vector (T,F,T,T,T,F…), where you treat the T = 1, F = 0, is just the proportion of True values in the vector.

So, this might be something I incorporate earlier in a stats class then later.  Bootstrapping is advanced.  But maybe the counting isn’t?