## z-scores: Where is My Cut-Off?

**Moderators:** shawnski, jdenver, rogerwimmer

**Forum rules**

How do you ask the Research Doctor a question?

Just click on "New Topic" below and then type your question to me. Please put in a Subject and then ask your question in the body.

Your question is submitted annonymously unless you include your name.

Click "Submit" and then check back to find Dr. Wimmer's answer to your question.

If you wish to have a private answer to a question (not posted in the forum), please include your email address. Dr. Wimmer will send your answer directly to you.

Dr. Wimmer maintains an Archive of questions from the forum. Go to http://www.rogerwimmer.com and click on "The Research Doctor Archive" link.

- rogerwimmer
**Posts:**197**Joined:**Mon Sep 28, 2009 12:40 pm-
**Contact:**

### z-scores: Where is My Cut-Off?

Doctor: I finally successfully converted my latest sets of callout to z-scores, and I am very much enjoying looking at the songs sorted this way. Thank you very much for so thoroughly educating your readers on the merits of using z-scores.

After reading your z-score archives tonight, one question didn't come up that seems important: How do I determine where my cut-off should be using z-scores? As in, what z-score will indicate to me that the song in question is a dog that my listeners don't want licking them?

If it helps, my most recent test had a top z-score of 1.48 and a bottom of -2.11. Clearly the -2.11 is a candidate for dropping, but what of a song that's a -0.02?

Sorry if you answered this somewhere and I simply missed it. Thanks in advance for setting me straight. - Anonymous

Anon: While I have answered a few questions about z-score cut-off levels, your question is unique because of how you are using them. I'll explain that in a moment. You're welcome for the information about z-scores. I hope other readers have decided to use them for music tests and perceptual studies because z-scores make interpretation of the data about 99.9% easier than reading simple summary statistics (mean, median, mode, etc.). OK, on to your question...

Here is why your situation is unique . . . Like everything in behavioral research, z-scores are affected by sample size. The results from a small sample of respondents usually can't be generalized to the population from which the sample was drawn because the small number of people usually doesn't represent the range of opinions/likes/dislikes of the population. That's known as sampling error.

Likewise, computing z-scores on a small number of data points, such as 20 songs in a callout, may not produce results that initially look meaningful. When z-scores are computed on a large sample (usually over 100), the plotted results (the z-scores plotted on a graph) will look like the typical "normal bell curve" that is discussed in statistics classes, where there is a "hump" in the middle with gradual declining lines on the left and right of the hump (see examples in this search.

In your case, I'm about 99.9% sure that your plotted results do not represent a normal curve. What you probably have is a "hump" that is skewed to the left of the mean, which is always 0.0 with z-scores. This is further verified when you said that your top z-score is 1.48 and the bottom is -2.11. My guess is that you have more negative z-scores than positive z-scores. I'm just guessing. However, that isn't a big deal. Here is a list of things you can do:

1. In a typical situation with a large sample, 50% of your scores (data) should fall between a z-score of -.68 and +.68. In a normal distribution situation where a large sample is used, z-score cut-offs are usually set at -1.5 to +1.5 or -2.0 to +2.0. In a music test, that means "excellent" songs are those that have a z-score of +1.5/+2.0 or higher, and "bad" songs are those that have a z-score of -1.5/-2.0 or lower. ("Good" songs fall between -1.49/-1.99 and +1.49/+1.99. That's typical. However, look at your z-scores. My guess is that 50% of your scores probably fall between -1.8 and +.50, or something close to that. That's a complete guess because I don't have your data.

2. Assuming that's the case, and I'm almost certain that it is, then move your cut-off levels to the left. Instead of using typical cut-offs of -1.5 and +1.5, use something like -1.0 and +1.0, or even -.8 and +.8. (The only way I can give you a valid and reliable answer is to see your data.)

3. What you should do is look at several callout reports and determine an average cutoff level. You shouldn't have to change the cut-off levels for each callout. The average will help you determine consistent levels.

4. Finally, you asked a question about interpretation when you said, "Clearly the -2.11 is a candidate for dropping, but what of a song that's a -0.02?" A z-score of -.02 is an average testing song since it is only two hundredths away from the mean of 0.0. It's not a great song and it's not a terrible song. The decision about what to do with average testing songs falls on the talent and skill of the PD.

If you would like a more definitive answer, I need to see your data. I'll be happy to look at your numbers . . . confidentially, of course. If you're interested, send the information via email to roger@rogerwimmer.com.

(Want to comment on this question? Click on the POSTREPLY button under the question.)

After reading your z-score archives tonight, one question didn't come up that seems important: How do I determine where my cut-off should be using z-scores? As in, what z-score will indicate to me that the song in question is a dog that my listeners don't want licking them?

If it helps, my most recent test had a top z-score of 1.48 and a bottom of -2.11. Clearly the -2.11 is a candidate for dropping, but what of a song that's a -0.02?

Sorry if you answered this somewhere and I simply missed it. Thanks in advance for setting me straight. - Anonymous

Anon: While I have answered a few questions about z-score cut-off levels, your question is unique because of how you are using them. I'll explain that in a moment. You're welcome for the information about z-scores. I hope other readers have decided to use them for music tests and perceptual studies because z-scores make interpretation of the data about 99.9% easier than reading simple summary statistics (mean, median, mode, etc.). OK, on to your question...

Here is why your situation is unique . . . Like everything in behavioral research, z-scores are affected by sample size. The results from a small sample of respondents usually can't be generalized to the population from which the sample was drawn because the small number of people usually doesn't represent the range of opinions/likes/dislikes of the population. That's known as sampling error.

Likewise, computing z-scores on a small number of data points, such as 20 songs in a callout, may not produce results that initially look meaningful. When z-scores are computed on a large sample (usually over 100), the plotted results (the z-scores plotted on a graph) will look like the typical "normal bell curve" that is discussed in statistics classes, where there is a "hump" in the middle with gradual declining lines on the left and right of the hump (see examples in this search.

In your case, I'm about 99.9% sure that your plotted results do not represent a normal curve. What you probably have is a "hump" that is skewed to the left of the mean, which is always 0.0 with z-scores. This is further verified when you said that your top z-score is 1.48 and the bottom is -2.11. My guess is that you have more negative z-scores than positive z-scores. I'm just guessing. However, that isn't a big deal. Here is a list of things you can do:

1. In a typical situation with a large sample, 50% of your scores (data) should fall between a z-score of -.68 and +.68. In a normal distribution situation where a large sample is used, z-score cut-offs are usually set at -1.5 to +1.5 or -2.0 to +2.0. In a music test, that means "excellent" songs are those that have a z-score of +1.5/+2.0 or higher, and "bad" songs are those that have a z-score of -1.5/-2.0 or lower. ("Good" songs fall between -1.49/-1.99 and +1.49/+1.99. That's typical. However, look at your z-scores. My guess is that 50% of your scores probably fall between -1.8 and +.50, or something close to that. That's a complete guess because I don't have your data.

2. Assuming that's the case, and I'm almost certain that it is, then move your cut-off levels to the left. Instead of using typical cut-offs of -1.5 and +1.5, use something like -1.0 and +1.0, or even -.8 and +.8. (The only way I can give you a valid and reliable answer is to see your data.)

3. What you should do is look at several callout reports and determine an average cutoff level. You shouldn't have to change the cut-off levels for each callout. The average will help you determine consistent levels.

4. Finally, you asked a question about interpretation when you said, "Clearly the -2.11 is a candidate for dropping, but what of a song that's a -0.02?" A z-score of -.02 is an average testing song since it is only two hundredths away from the mean of 0.0. It's not a great song and it's not a terrible song. The decision about what to do with average testing songs falls on the talent and skill of the PD.

If you would like a more definitive answer, I need to see your data. I'll be happy to look at your numbers . . . confidentially, of course. If you're interested, send the information via email to roger@rogerwimmer.com.

(Want to comment on this question? Click on the POSTREPLY button under the question.)

Roger Wimmer is owner of Wimmer Research and senior author of Mass Media Research: An Introduction, 10th Edition.