Let’s Conduct a Presidential Election Opinion Poll

As the 2024 American presidential election approaches, the media will inundate us with opinion polls forecasting the election. How do pollsters conduct these polls? I will provide a non-technical overview in this post.  

For this overview, imagine the Biden campaign has hired us to conduct a poll.

Contact Method

We will telephonically ask (survey) a small subset (sample) of our population of interest regarding their voting preferences and questions to understand what drives these preferences.

Questionnaire Design

For our survey to yield unbiased responses, the questions must be clear and concise so that Democratic and Republican respondents with different worldviews interpret them similarly.

The order of questions can prime respondents, so let’s ask the all-important question of candidate preference right after the initial screening questions.

Given that Trump is a candidate, we must pay extra attention to eliciting honest opinions, not answers respondents believe are socially acceptable.

Sampling Decisions

At the outset, we must decide who to survey, how to select the participants, and how many to survey.

Who to Sample

American citizens 18 years and older comprise our population of interest. Because they must be registered to vote, we should survey registered voters. Even better, let’s survey registered voters who are likely to vote. We can determine registration status and voting likelihood right at the start of our phone survey.

Sample Selection

We will select our sample such that each voter has an equal chance of participating in our poll. Although this goal is difficult to achieve perfectly, we’ll do our best.  Specifically, we will generate a random sample of telephone numbers to call by random-digit dialing.  

Sample Size Determinants

It may seem counterintuitive, but sample size does not depend on population size. It depends on the following three factors:

Variability: Consider a scenario with no variation in voter preferences, meaning all Americans intend to vote for the same candidate. In this case, asking only one person is sufficient to predict the election, the large population notwithstanding. 

Conversely, we require an appropriate-sized sample to capture variation in voter preferences.  

Accuracy/Error: Our sample is a tiny population subset, so our forecast based on this sample will likely differ from the actual value for the entire population. This error decreases as the sample size increases. 

For instance, if our sample had a million respondents (though not practically feasible), our prediction would closely align with the actual voting preferences of the entire population. 

Confidence level: Our poll sample is one of numerous possible samples we could have chosen—each unique sample could yield varying results regarding support for Biden and Trump.

We can compute the range of results we would get from polling all possible samples by using the principles of probability and statistics and the information from our one-time poll.

Furthermore, using these principles, we can state, with a specific degree of confidence, that the population-wide support for Biden lies within our computed range of results. 

For instance, say a poll with a margin of error of ±3% and a confidence level of 95% reveals that 46% of respondents support Biden. While 46% represents our best estimate, given the margin of error and confidence level, we can say with 95% confidence that the actual support for Biden lies within an interval of 43% to 49%.

Increasing the sample size boosts our confidence that the population-wide support for Biden lies within our calculated interval. 

Sample Size Determination

Computing sample size is simple; plug values for the three sample size determinants described above into a straightforward formula (see notes).

How do we get these values? We will assume maximum variability (see notes), and the Biden campaign will provide us with the values for the margin of error and the level of confidence acceptable to them.

Ideally, the Biden campaign, or any other client, would want the least possible error. But accuracy comes at a cost, so it’s best to balance the two.

Accuracy Vs. Cost

For a poll with 95% confidence and a ±3 % margin of error, the required sample is 1,068 respondents (see notes for calculations). If we bill our clients $50 per respondent polled, the Biden campaign would spend $53,400. Not too shabby!

Now, picture this: if the campaign wanted an error of just 1%, the required sample size is 9,604, costing a whopping $480,200.

Reducing the margin of error from 3 to 1 percent would cost an extra $426,800.

In seeking more and more accuracy, small gains come with a disproportionately higher cost.

To balance accuracy and affordability, pollsters and their clients usually settle on a sample of about 900 to 1,100 respondents, associated with a margin of error between 3.25 and 3.0 percent. It’s a sweet spot of acceptable accuracy with reasonable cost.

Fine Tuning Poll Results

Our random sampling procedure eliminates bias regarding who gets selected in our sample but doesn’t guarantee a perfect representation of the population’s demographic composition.

No worries, we can adjust the poll results to reflect the demographic makeup of the population.

For instance, if our sample underrepresents African Americans, who tend to favor Democratic candidates, the poll may underestimate support for Biden. To fix this issue, we will overweight the responses of African-American respondents.  

For weighting, we have to choose which variables, such as race, gender, age, region, education, income, and party affiliation, to align the sample with the population’s characteristics. This choice is an art. Using the same unweighted data, different pollsters might produce different results based on their weighting choices.

In conclusion, understanding the art and science of opinion polls empowers us to interpret the barrage of election polls that will flood us in the coming months.

Notes

Here’s an excerpt from a recent presidential opinion poll report (you’ll notice it refers to some aspects discussed in this blog post). “Looking ahead to the 2024 general election, the NBC News poll shows Biden and Trump tied in a hypothetical contest among registered voters, 46% to 46%…The national NBC News poll was conducted Sept. 15-19 of 1,000 registered voters — including 848 contacted by cell phone — and has an overall margin of error of plus or minus 3.1 percentage points.” (https://www.nbcnews.com/meet-the-press/first-read/poll-overwhelming-majorities-express-concerns-biden-trump-ahead-2024-r-rcna111347)

We rely on probability theory to predict the election outcome. This theory is only applicable when working with a probability sample in which each member of the population has a known and non-zero chance of appearing in the sample.

Telephone surveys, the most common method for conducting presidential opinion polls, have a non-response bias problem. Many Americans don’t answer calls from unfamiliar numbers, and those who do often have different demographic characteristics than non-responders. If non-responders have different opinions than responders, our poll represents the views of those willing to participate, not the entire population. This issue makes it difficult to obtain a truly random sample.

Imagine we conduct a poll and find 51% of voters support Biden and 49% support Trump. If we repeat the poll using the same procedure and sample size, we will likely get a different result because the respondents in our sample would be different. For instance, the second poll might show 52% for Trump and 48% for Biden. Oops! We would have potentially misinformed the Biden Campaign had we relied on the first poll. We could repeat the poll a third, a fourth, and thousands of times with different samples drawn from our population. These polls will return numerous values regarding support for Biden. The collection of these values is called a sampling distribution, and for sample sizes greater than 30, it follows a normal distribution. Consequently, our knowledge of normal distribution applies to the sampling distribution; we use this knowledge to establish the confidence interval and associated confidence level for the poll.

We derive the sample size formula by rearranging the expression for the margin of error of the sampling distribution. The margin of error is a product of the Z value and standard error, the standard deviation of the sampling distribution. The formula is:

N = Z2 (p)(1-p)/e2

N = sample size

Z z score corresponds to the desired confidence level. For instance, z = 1.96 for a 95% confidence (specified by the client)

p = proportion of the population that we believe supports Biden, and (1-p) the proportion that does not support Biden. The product of p and (1-p) is a measure of variation in the population. This variation is maximum at a 50/50 support for and not for Biden. We will play it safe and use the maximum possible variation.

e = margin of error (specified by the client)

So, for example, if we want a 95% confidence level and are willing to accept a 3% margin of error, the sample size calculation would be:

N = (1.96)2 (50)(50)/(3)2

= 1,067.11 ≈ 1,068

 

 

Share

You may also like

2 Comments

  1. As US is winner takes all on state basis. Sampling may have meaning meaning only in closely contested states. One wants to predict winner not percentage gap. Percentage gap relevant only for likely loses to decide level of effort to turn tables

    1. Thank you, Arun.
      Winner and percentage gap are two sides of the same coin.
      Numerous state-wide polls supplement the national polls.