P-Value vs Level of Significance

What is the logic behind the fact that if the p-value is lower than the level of significance, you can automatically reject the null hypothesis? My intuition leads me to believe it would be the opposite

I say this frequently, but CFA Institute (and the prep providers) are disgustingly incompetent at teaching statistics, which is why they lead you to believe it is “automatic”, and why they can’t provide a decent explanation of most things. This kind of hypothesis testing is actually a mashup of 2 distinct schools of thought.

The true p-value definition is the probability of observing a result (say, sample mean) at least as contradictory (far away from, showing disagreement) to the null hypothesis as the observed result, assuming the null hypothesis is true.

In context for a two tailed test: Suppose a sample correlation coefficient of .7 and p-value .03. If there truly is zero Pearson correlation (Null is true, rho=zero), the probability of seeing a correlation at least as extreme as |.7| is .03.

The idea is that if we see relatively small p-values we need to think that something is amiss, such as bias or an assumption. If we correct biases and repeat experiments/studies and see similar results, maybe one of our assumptions (such as the null hypothesis) is incorrect. This was developed for inductive inferences as a continuous measure of evidence.

The alpha cutoff was introduced as a way to use summarized evidence to make decisions while controlling long-run error rates (Type I error rate defined as alpha, Type II error rate as beta). Each decision we must weigh the risks and benefits of committing each kind of error. The more egregious error should be the Type I error since we desire to avoid it more and can more accurately control this long-run event rate. Again, this was developed for behavioral decisions.

The merged idea (some people say this was a bad idea): IF the null hypothesis is true, we make a Type I error any time we reject Ho. So, if we set alpha at .05, then any time we see a p-value .05 or less, we reject Ho. If the null is true, this will occur at a rate of alpha (.05 in our example, assuming continuous distributions and valid assumptions).

Really simple explanation: we’re setting an a priori long-run threshold to decide if something provides enough evidence for us to reject one of several underlying assumptions (the null hypothesis). If it falls below the threshold, it tells us our data disagree “enough” (based on our threshold of alpha) to make a conclusion of the alternative.

Not reviewed for minor errors, feel free to correct or ask questions.

Thank you so much! This is the most in-depth and thoughtful reply I’ve ever been given on this website. Much appreciated

I’m sure some dude with a horse will outdo me in no time! Glad you thought it was helpful!