Search This Blog

Divided We Stand

Divided We Stand
New book about the 2020 election.

Thursday, July 22, 2021

Why Were the 2020 Polls Off the Mark?

 In Defying the Odds, we talk about the social and economic divides that enabled Trump to enter the White House. In Divided We Stand, we discuss how these divides played out in 2020.  In both elections, however, polls tended to overstate Democratic margins. 

AAPOR report on polling problems in 2020.

 Several proposed explanations can be ruled out as primary sources of polling error in 2020. Our analyses suggest the following. 

  •  Polling error was not caused by late-deciding voters voting for Republican candidates. More voters voted prior to Election Day in 2020 than ever before and the number of undecided voters was relatively small. Only 4% of poll respondents, on average, gave a response other than “Biden” or “Trump” when asked by state-level presidential polls conducted in the final two weeks. Unlike in 2016, respondents deciding in the last week were as likely to support Biden as Trump, according to the National Election Pool exit polls.  
  • Polling error was not caused by a failure to weight by education. A suspected factor in 2016 polling error was the failure to weight by education (Kennedy et al. 2016). In the final two weeks of the 2020 election, 317 state-level presidential polls (representing 72% of all polls conducted during this period) provided information on the statistical adjustments accounting for coverage and nonresponse issues; of these 317 polls, 92% accounted for education level in the final results. 
  • Polling error was not primarily caused by incorrect assumptions about the composition of the electorate in terms of age, race, ethnicity, gender, or education level. There is no evidence that polling error was caused by the underrepresentation or overrepresentation of particular demographics. Reweighting survey data to match the actual outcome reveals only minor changes to demographic-based weights. 
  • Polling error was not primarily caused by respondents’ reluctance to tell interviewers they supported Trump. 7 The overstatement of Democratic support occurred regardless of mode and the overstatement of Democratic support was larger in races that did not involve Trump (i.e., senatorial and gubernatorial contests). 
  • Polling error cannot be explained by error in estimating whether Democratic and Republican respondents voted. Trump supporters and Biden supporters were equally likely to vote after saying they would. This conclusion is based on validating the vote of registration-based samples shared with the Task Force by some AAPOR Transparency Initiative members. 
  • Polling error was not caused by the polls having too few Election Day voters or too many early voters. Among the 23 state-level presidential polls conducted in the final two weeks that reported how respondents said they would vote, the proportion of Election Day voters closely matched the percentage of certified votes cast on Election Day

The report reaches no definitive conclusions on what went wrong, but offers some plausible hypotheses:
  • At least some of the polling error in 2020 was caused by unit nonresponse. The overstatement of Democratic support could be attributed to unit nonresponse in several ways: between-party nonresponse, that is, too many Democrats and too few Republicans responding to the polls; within-party nonresponse, that is, differences in the Republicans and Democrats who did and did not respond to polls; or issues related to new voters and unaffiliated voters in terms of size (too many or too few) or representativeness (for example, were the new voters who responded to polls more likely to support Biden than new voters who did not respond to the polls?). Any of these unit nonresponse factors could have contributed to the observed polling error. Without knowing how nonrespondents compare to respondents we cannot conclusively identify the primary source of polling error.
  •  Factors that worked well in correcting for nonresponse in previous elections (including demographic composition, partisanship, or 2016 vote) did not render accurate vote estimates for the 2020 election. Poll data provided by some AAPOR Transparency Initiative members were reweighted to match the 2020 certified outcome. It was necessary to increase the percentage of Republicans (or 2016 Trump voters) and decrease the percentage of Democrats (or 2016 Clinton supporters) in the outcome-reweighted sample. In contrast, there are only slight differences between the originally weighted poll data and the outcome-reweighted data in terms of standard demographic categories.
  •  Weighting to a reasonable target for partisanship and past 2016 vote does not fully correct the polling error. Reweighting the polls to reproduce the 2020 outcome requires a much larger margin for Trump in 2016 than actually occurred among respondents who report voting in 2016. The larger 2016 margin for Trump among those who reported voting for Trump in 2016 could be caused by the following: an issue with the weighting targets, i.e., the implied vote share among 2016 voters who voted in 2020 was different from the 2016 actual outcome; or differences in opinion within groups that responded, e.g., the 2016 Trump supporters who responded to polls were more likely to vote for Democrats than those who did not. It is impossible to know which caused the larger 2016 margin.
  •  It is possible that 2020 pre-election polls were not successful in correctly accounting for new voters who participated in the 2020 election. There were many new voters in 2020 and it is unclear whether the proportion of new voters in the polls matched the proportion of actual new voters. It is also unclear whether the new voters who responded to polls had similar opinions to those who did not respond. Given the relative proportion and self-reported voting behavior of these new voters in the data available to the Task Force, this group of voters pushed the overall polling margins in the Democratic direction. Error in polling this group could have produced the observed polling error.