Z value versus Test Statistic

According to Kaplan, z = (observation - population parameter) / standard deviation. This is in the Common Probability Distributions section.

In the hypothesis testing section, the test statistic is = (sample statistic - null hypothesis value for the parameter) / standard error.

Can someone explain the difference (if any) between the two? And when to use each assuming there is a difference in their respective applications to the content.


In the first situation, you know the population mean (which, in real life, you never know); you’re simply determining how many standard deviations a particular observation is from the population mean.

In the second situation, you don’t know the population mean (which, in real life, is the usual situation); you’re trying to determine whether it’s reasonable to conclude that the population mean is the hypothesized value.

So is a test statistic a form of a z score?

And what is the logic and theory behind the division by the sample error?

Also in your second paragraph, how can we determine if the population mean is the hypothesized value if we don’t even know the population mean?

A z-statistic is a kind of test statistic, as are chi-squared, F, T, (and other) statistics. Test statistic is a broad term, the others are specific to the underlying distribution used in the test.

You’re creating a test statistic to help you gauge how far away from the center of a sampling distribution your observed sample statistic is, so you divide by the standard deviation of the sampling distribution (the standard error).

Our sample is used to estimate features of the population. If we have a sample mean of 5, but believe the true mean is 10, we can calculate a test statistic based on this (because x-bar is an unbiased estimator of mu). This can help us determine if, based on our sample, it’s unreasonable to say the true mean, mu, is 10.

You can create z-scores without knowing the population mean. From a practical standpoint, you’re dividing the deviation by the sample standard deviation to see how unusual the particular observation is (i.e. is it an outlier/suspect outlier?). For this, you use an estimate of the sampling variation for X, which is the sample standard deviation of X.

I would more say that the second situation involves division by the standard error of the statistic because you’re trying to make an inference about the population parameter by comparing the sample statistic to the suspected center of it’s sampling distribution (i.e. how far from the center of x-bar’s sampling distribution is your observed x-bar?). For this you utilize an estimate of sampling variation for x-bar (the sample standard error of x-bar).

This, of course, is why we value you so much here.

Apologies for my ignorance:

  1. I don’t understand what the standard error is and how that relates to the standard deviation? Is the standard error simply the standard deviation of the sample distribution? If that’s the case, what does “s” mean? I thought “s” represented the sample standard deviation?

  2. If z-score is a test statistic, why don’t we divide it by the standard error?

  3. In the first post of the link below, why is the test statistic divided by the standard error?


  1. What is the relationship between the z-score and the test statistic in the link’s example?

I’m trying to keep the applications relevant to finance as I’m quite pathetic at statistics and non-finance examples will serve only to confuse me further.

You seem to handle nearly all of the heavy lifting!

Don’t feel the need to make that apology again…anyone can be put in a situation where they’re not familiar with something!

The standard error can be conceptualized as a standard deviation. The important part is that it’s the standard deviation for a particular statistic (and is calculated differently depending on the statistic), and it helps us describe the spread of the sampling distribution for that statistic. The sample standard deviation typically is referred to by “s”, you’re absolutely right. The standard error is often abbreviated “s.e.” or “s.e.m.” (the latter being specific for standard error of the mean, used to describe the sampling distribution of x-bar). Remember, the sampling distribution of a statistic is what we can approximate by taking many samples of size n, say 10,000 samples of size n, and calculating x-bar for each sample. The distribution of x-bar is an approximation of the sampling distribution of x-bar (at whatever sample size we used for all 10,000 samples).

We do, when the situation is right. Z-scoring is typically the term used to refer to a value calculated to tell us how far from the center a single observation is–it helps us detect outliers and suspect outliers. In this situation, you use the sample standard deviation since it estimates variability for single observations. You could, though, calculate a z-statistic to make an inference about a population parameter like the mean (basically we’re asking, “is our sample mean statistically unusual, given our assumptions, or is it within expectations, given our assumptions-- significant vs nonsignificant?”). In this case, you would use the standard error of the mean, which represents the variability in sample means calculated from a sample of size n. It just depends on what question you’re asking, then you pick the appropriate estimate of variability.

Because the standard error represents sampling variation for the sample mean (again, think of it as the standard deviation for the distribution of x-bar at that sample size). Since they want to make a statement about the population mean, we need to use our sample mean, and measure of sampling variation for the sample mean.

The z-statistic in the example is the calculated test statistic.

Practice is the only way to feel more comfortable wink

Slowly getting the hang of it - thank you so much.

  1. So SEM is basically when we create a series of tests using N sample size. We compute the sample mean for each test and then create a “sample distribution” from the aforementioned sample means. SEM is calculated as the standard deviation of the sample mean of this distribution relative to all the other means calculated in this distribution. The SEM calculates how precise/accurate the estimator is to the population mean. That is why as N increases in size, SEM decreases because we are capturing a larger % of the total population and thus there is less Standard Error to the population mean.

Standard deviation, on the other hand, measures the amount of dispersion/scatter of the values from one test relative to the sample mean of this single distribution.

Are my above interpretations correct?

Additional question: Do we always use SEM when we calculate Confidence Intervals? Or is there a time we just use Standard Deviation?

Not sure if the bolded sentence is correct.

  1. Z scores can be used in two situations: To estimate the variability (standard deviation; how far from the center) of a SINGLE observation relative to a population parameter (such as mean) or a sample mean. Secondly, it is used to estimate the variability (standard deviation) of sample means so as to determine its relation to a population mean.

In the first case, the equation is as follows:

Z = (observation - population parameter) / (Standard Deviation)

In the second case, the equation is as follows:

Z= (Sample mean - Null hypothesis value) / (Standard Error)

Are my interpretations and equations correct?

No problem!

** I mentioned the repeated sampling, because it’s helpful for conceptualizing. Normally, we calculate an estimate of the SEM from one sample (it takes time and money to keep sampling, and it’s not as practical). However, this idea of repeated sampling will help you understand the big picture, and it seems like you’ve moved towards that.

The next bit is mostly beyond the curriculum, although I think they should include it at some point (but don’t worry if it doesn’t make perfect sense). If we take a sample of n=50 and calculate x-bar, we can estimate the sampling distribution of x-bar by bootstrapping (a kind of resampling method). To estimate the sampling distribution of x-bar and the s.e.m., we would create a program that takes a large number of samples of size 50 from our actual sample. Basically, we tell a computer to draw one observation from our 50 and replace it after recording this value. When 50 draws with replacement have occurred, we calculate that new sample mean and set it aside. We repeat this for many times, say 10,000 in total, so now we have 10,000 sample means and an estimate of the s.e.m. (standard deviation of those sample means)-- this technique can be used in many scenarios (for many reasons), but again, it’s well beyond what I’ve seen in the curriculum. I mentioned it because you did seem to pick up on the concept, and I think it’s valuable (beyond the exam) to know what else is out there.

More or less, you’ve got it. I’d only change “estimator” to “estimate”, since the estimate is what we’ve calculated and the estimator is the more generic term of a statistic used to estimate a parameter. Also, the _ bold italic _ part is only true for a consistent estimator, but you’ve got the overall picture.

SEM is used in a confidence interval for the mean. You use the standard error relevant to the population parameter you’re estimating with a CI (slope coefficient, correlation coefficient, standard deviation, etc.)

I would say measuring distance from center, in terms of standard deviations, is a good way to summarize it.

For the first one, you’d see observation minus x-bar (or whatever the sample statistic is). You’re basically using it to describe the individual value in the sample. It looks like you’ve gotten a much better picture of what’s going on here!

Wow. Thanks for the detailed explanations!

I know this is outside the scope of the curriculum, but I’m quite curious:

In terms of the 10,000 sample means example, the SEM would be the Standard Deviation of those sample means. In other words, we would calculate/measure the “distance from center” (the center is the mean of the 10,000 sample means?) of the 10,000 sample means to get the SEM. And then, from there, we can see how reliable of an estimate our sample mean’s accuracy relative to the population mean. Is that correct?

I’d love to see you continuing to contribute to AF - I’m sure many of the candidates would love your answers and explanations. Were you a stats major in undergrad?

You got it. Just remember it’s an estimate of the SEM. The true SEM would only be obtained by actually having access to the true sampling distribution (which won’t likely be the case, because we would have to actually take repeated samples (all possible) from the population instead of bootstrapping from our sample).

Since we’d have a computer doing this for us, we would request summary statistics for our 10,000 (or 1,000 or 50,000 or however many we generate) sample means. The summary statistics would tell us a standard deviation, and because this is the estimated standard deviation of sample means, it’s an estimate of the standard error of the mean.

Check this out if you’d like a bit more of a brief intro https://onlinecourses.science.psu.edu/stat464/node/13

PSU has an online applied statistics masters degree, but they have many course notes published and readily accessible with examples (free learning opportunity).

Thanks, I’ll try to stick around for a bit! I was not a stats major (I have some undergraduate and graduate level coursework, though), and I was a graduate teaching assistant for an undergrad statistics course (somewhat of a unique situation…right place, right time, with the right coursework). I’ve also done a bit of self learning since that time since I really enjoy the theory and application of statistics. I occasionally get to speak with some PhD statisticians, which is helpful to make sure I actually understand some of the topics that are less common in texts. I’d eventually like to go back and get a graduate degree in stats (just have to find the time).