Music Tests & Small Samples

Ask the Research Doctor a question.

Moderators: shawnski, rogerwimmer, jdenver

Forum rules
How do you ask the Research Doctor a question?

Just click on "New Topic" below and then type your question to me. Please put in a Subject and then ask your question in the body.

Your question is submitted annonymously unless you include your name.

Click "Submit" and then check back to find Dr. Wimmer's answer to your question.

If you wish to have a private answer to a question (not posted in the forum), please include your email address. Dr. Wimmer will send your answer directly to you.

Dr. Wimmer maintains an Archive of questions from the forum. Go to and click on "The Research Doctor Archive" link.
Post Reply
User avatar
Posts: 197
Joined: Mon Sep 28, 2009 12:40 pm

Music Tests & Small Samples

Post by rogerwimmer » Mon Mar 29, 2010 11:54 am

Hi Roger: To deal with the lack of budgets and the high cost of music testing, some companies are now doing auditorium music testing using a sample as small as 40 participants. Typically, the sample would be all one gender spread over a 10 to 12 year age range. Is this a credible sample size? What is the smallest sample size you would say is statistically reliable?

Thank you for the work you do. - Anonymous

Anon: Hi to you too. You're welcome for the work I do. Thanks, and on to your question . . .

While your question may seem simple to some people, you have actually opened a "can of worms" about research and sampling, and I'm afraid my answer isn't going to be short because I need to address several things. So . . . bear with me, and you may want to get a 6-pack of your favorite beverage to drink while you read this.

Sample Size
The reason sample size is important in any research study is because sample size determines the amount of sampling error present in the study. With one exception that I'll explain in a moment, the general rule is: The smaller the sample, the larger the sampling error. Let me explain that . . .

Research projects conducted with human beings are dramatically different than research studies conducted with static elements, such as a study conducted by a chemist who is testing the tensile strength of different types of metals. Steel is steel. Aluminum is aluminum. But in the behavioral science research with human beings, one respondent (subject, radio listener, TV viewer, etc.) is not exactly the same as another respondent in the sample. Humans are very different, and this is clearly evident when a random sample is selected without any qualifications—screener questions—such as age, sex, number of hours listen to the radio in a day, favorite radio station, and so on. In other words, behavioral science research uses human who are very different (they vary from one another, called "variance" in research), and this variance introduces error when the results of such studies are generalized to the population from which the sample was drawn.

Sampling error was introduced by statisticians in the 1920s and 1930s to compensate for the differences among people in a sample. So, for example, when a behavioral research study is conducted, the researcher will say something like, "A sample of 400 respondents was used for this study, which produces a sampling error of ±4.9%." This means that if the study shows that 50% of the sample likes a specific radio station (or the answer to any other answer), the "real" answer is somewhere between 45.1% and 54.9%. The sampling error percentage is actually a statistical "fudge factor" because of the variance present in the respondents in the study. There is NO research study in the behavioral sciences that produces results with 100% certainty. None. Nada. Zilch. The reason? The studies involve human beings.

This obviously means that NO radio research study should be interpreted without considering sampling error. If a researcher, or anyone else says something like, "The study shows that 12% of the Women 18-34 name WAAA as their favorite," is not interpreting the data correctly. The researcher, or whomever, should say something like, ""The study shows that about 12% of the Women 18-34 name WAAA as their favorite." Or, if using the actual sampling error, the statement would be, "The study shows that between 11% and 15% of the Women 18-34 name WAAA as their favorite."

In summary, sample size is important in research because it determines sampling error. If a sample is used that is too small, sampling error may approach 90% or more. That's crazy.

(Note: I have a sampling error calculator on the home page of my business website – click here and then click on the 95% option.)

Sample Size – An Exception to the "General Rule"
I said earlier that there is an exception to the general rule, "The smaller the sample, the larger the sampling error." This exception is going to be the basis/root/heart/foundation for the answer to your question.

Let's look at an example to demonstrate this exception. Let's say I'm interested in finding out which radio station is listened to most often by people in my city. In Research Study A, I go out to a busy street during lunchtime and interview 100 people, regardless of age or sex, or any other qualifier, and ask them, "Which radio station do you listen to most often during a typical day?" I record all the data and in my results, I say something like, "WBBB is the number one radio station in the city according to a recently conducted research study."

Now, in Research Study B, I go out to a busy street during lunchtime and stop 100 people, but only women who are 18-24 years old. I ask them if they live in the area and listen to the radio at least one hour per day. If "yes" to both questions, I then ask, "When you have the choice, which radio station do you listen to most often during a typical day?"

Which study, A or B, will have the least amount of sampling error (variance)? You better say "B" or I'll come out there and smack you. Obviously the women in Study B will have less variance because they are all in the same age group, live in the area, and listen to the radio at least one hour per day. I also eliminate the error of the influence of others in a person's listening choice by asking, "When you have the choice . . ."

In Study A, the calculated sampling error is ±10%. In reality, however, the error would be much higher because I could have interviewed small children, a mixture of males and females, or a bunch of other weird options. The results would give me only a very limited indication of which radio station is listened to most often. The results could not be generalized with any degree of certainty to the total population of the city. Moral? A decent sample may be large enough to produce good results, but without any qualifications (screeners), the results are virtually meaningless beyond something like, "Hey, this is interesting!"

Variance – So What?
Now, I hope it's apparent that including screener questions in a research study dramatically reduces sampling error. The reason is that the qualifiers force the sample to be homogeneous (similar). Heterogeneous samples are loaded with sampling error; homogeneous samples are not. That's a good thing for research projects. All radio stations have specific age and sex targets, and those people should be used in all research studies a radio station conducts, unless there is a need to look beyond the target. Listen to me now and believe me later.

Smaller Sample Music Tests?
With all that as an introduction, it's time to address your specific question about some research companies using smaller samples for music tests.

A typical auditorium music test usually includes 80-100 respondents. However, these respondents are usually broken into smaller age and/or sex cells. So . . . although the results from the Total Sample are looked at, researchers, consultants, and radio station people also look at the smaller cells. There is no problem with that as long as the respondents are good; that is, as long as the respondents are properly screened.

In essence, then, the smaller sample studies you ask about have been used since auditorium music tests started around 1982. I don't see anything wrong with using a sample of 40 participants as long as the sample is properly selected. If a radio station is interested only in Women 25-34 (or whatever), a strictly qualified sample of 40 is fine.

Another Sampling Thing to Consider
Now, this may shock you, but if a research study involves a very tightly controlled homogeneous sample, there is no problem with looking at cell sizes of 20. So, although I said a sample of 40 is fine with Women 25-34, if the screener to recruit the respondents is designed by an expert, then it should be possible to split that sample into 25-29 and 30-34 to get some indications of differences between the two cells. The homogeneity of the sample is the key and this can't be done in all studies with a broadly selected sample.

Music Test Design and Error Reduction
One more thing about reducing error. In addition to testing only a homogeneous sample, the research design used for music tests further reduces error. The type of design is called a "Repeated Measures Design." This means that the sample of people repeatedly rate only songs about the same number of seconds in length, and they use only one rating scale. In and of itself, the design reduces error. Neat, eh? The combination of a homogeneous sample and Repeated Measures Design make music testing an extremely valid and reliable research procedure.

Don't believe me? Consider this . . . Let's say that a researcher is going to test 400 songs, and tells the respondents, "The 400 hooks will vary in length from 4 to 15 seconds. Please use a "Hate/Love" scale of 1-10 for the first 100 songs, a 1-7 scale for songs 101 to 200, a 1-3 scale for songs 301-300, and a 1 or 2 scale for songs 301-400." The respondents would walk out.

What to remember? (1) Homogeneous sample; (2) Repeated Measures Design.

Are you tired of reading? Only one more thing to go.

A Final Check on the Sample
Regardless of how perfectly a screener or questionnaire is designed, or how perfectly the recruiting was done, almost every research study will have one respondent, or maybe even a few, who don't belong in the sample. These people may have lied when they answered the screener questions or maybe even guessed correctly even though they knew their answers were bogus. That's a fact of research and it happens in almost every study. That's cool. Some people like to sneak into things so they can get paid, but there is a way to weed these people out of the sample.

What I'm about to describe is what I call the "Wimmer Sample Verification Procedure" (WSVP). I developed this process several years ago, but I'm willing to pass it along here since I finally included it in the latest edition of our textbook, Mass Media Research: An Introduction.

I mentioned research "variance" earlier. The term refers to how people differ from each other. Variance is the key to finding research respondents who don't belong in a study and it is the key to the WSVP. If a respondent varies too much from the rest of the sample, the person is eliminated. You can compute the WSVP yourself on a spreadsheet. If you can't do it, have your research company do if for you. If your research company can't do it, find another research company that can provide the information.

Here are the steps:
1. Calculate the standard deviation for each respondent's song ratings. Let me be very clear here: The standard deviations must be computed for respondents, not songs.

2. Calculate z-scores for all the respondents' standard deviation scores.

3. Eliminate any respondent whose standard deviation z-score is greater than ±1.5. A person with a z-score that high does not belong in the sample. If you like you can choose to eliminate anyone with a z-score greater than ±2.0, but I like 1.5 better.

The reason the WSVP works so well for music tests is that while someone may lie or guess his/her way into the study, it is virtually impossible for that person to mimic the responses of the other people in the sample because music test ratings are recorded individually – each person does not know how the rest of the group rated each song. A confederate respondent will virtually always have a z-score that is too high (positive or negative). The WSVP adds an additional element to why it's OK to conduct music tests with small samples.

What to remember? (1) Homogeneous sample; (2) Repeated Measures Design; (3) Wimmer Sample Verification Procedure. If these three things are followed/incorporated, there should be no problem conducting a music test with 40 respondents.

Any questions, or is that enough?

(Want to comment on this question? Click on the POSTREPLY button under the question.)
Roger Wimmer is owner of Wimmer Research and senior author of Mass Media Research: An Introduction, 10th Edition.

User avatar
Posts: 197
Joined: Mon Sep 28, 2009 12:40 pm

Re: Music Tests & Small Samples

Post by rogerwimmer » Mon Mar 29, 2010 2:15 pm

Dr. Wimmer: I would also like to thank you for the information you provide on your column. This is the best piece I have ever read about music tests and sampling. I learned a great deal from your answer to the person's question. Bravo! - Anonymous

Anon: Bravo? Wow, that's the first time I ever received that comment. You're welcome and I appreciate your comments. I now know that I write "pieces." Thanks for writing.
Roger Wimmer is owner of Wimmer Research and senior author of Mass Media Research: An Introduction, 10th Edition.

User avatar
Posts: 197
Joined: Mon Sep 28, 2009 12:40 pm

Re: Music Tests & Small Samples

Post by rogerwimmer » Wed Mar 31, 2010 4:02 pm

A big thank you Roger! One last question please. Does it add any further credibility to the results if I repeat these respondents over multiple tests? In other words will paneling respondents make the results more credible or less? - Anonymous

Anon: You're welcome.

It's always best to re-use respondents in any research, particularly music tests. The reason is that the results will be substantially more reliable since you're using the many of the same people (or all of them) and you won't have to deal with different sampling errors and other variables that may vary by using a completely different sample.

A panel (the same respondents) in any research study that is repeated is always the best approach.
Roger Wimmer is owner of Wimmer Research and senior author of Mass Media Research: An Introduction, 10th Edition.

Post Reply