In Defying the Odds, we talk about the social and economic divides that enabled Trump to enter the White House. In Divided We Stand, we discuss how these divides played out in 2020. In both elections, however, polls tended to overstate Democratic margins.
AAPOR report on polling problems in 2020.
Several proposed explanations can be ruled out as primary sources of polling error in 2020. Our analyses suggest
- Polling error was not caused by late-deciding voters voting for Republican candidates. More voters voted
prior to Election Day in 2020 than ever before and the number of undecided voters was relatively small. Only
4% of poll respondents, on average, gave a response other than “Biden” or “Trump” when asked by state-level
presidential polls conducted in the final two weeks. Unlike in 2016, respondents deciding in the last week
were as likely to support Biden as Trump, according to the National Election Pool exit polls.
- Polling error was not caused by a failure to weight by education. A suspected factor in 2016 polling error was
the failure to weight by education (Kennedy et al. 2016). In the final two weeks of the 2020 election, 317
state-level presidential polls (representing 72% of all polls conducted during this period) provided information
on the statistical adjustments accounting for coverage and nonresponse issues; of these 317 polls, 92%
accounted for education level in the final results.
- Polling error was not primarily caused by incorrect assumptions about the composition of the electorate in
terms of age, race, ethnicity, gender, or education level. There is no evidence that polling error was caused
by the underrepresentation or overrepresentation of particular demographics. Reweighting survey data to
match the actual outcome reveals only minor changes to demographic-based weights.
- Polling error was not primarily caused by respondents’ reluctance to tell interviewers they supported
7 The overstatement of Democratic support occurred regardless of mode and the overstatement of
Democratic support was larger in races that did not involve Trump (i.e., senatorial and gubernatorial contests).
- Polling error cannot be explained by error in estimating whether Democratic and Republican respondents
voted. Trump supporters and Biden supporters were equally likely to vote after saying they would. This
conclusion is based on validating the vote of registration-based samples shared with the Task Force by some
AAPOR Transparency Initiative members.
- Polling error was not caused by the polls having too few Election Day voters or too many early voters.
Among the 23 state-level presidential polls conducted in the final two weeks that reported how respondents
said they would vote, the proportion of Election Day voters closely matched the percentage of certified votes
cast on Election Day
The report reaches no definitive conclusions on what went wrong, but offers some plausible hypotheses:
- At least some of the polling error in 2020 was caused by unit nonresponse. The overstatement of Democratic
support could be attributed to unit nonresponse in several ways: between-party nonresponse, that is, too
many Democrats and too few Republicans responding to the polls; within-party nonresponse, that is,
differences in the Republicans and Democrats who did and did not respond to polls; or issues related to new
voters and unaffiliated voters in terms of size (too many or too few) or representativeness (for example, were
the new voters who responded to polls more likely to support Biden than new voters who did not respond to
the polls?). Any of these unit nonresponse factors could have contributed to the observed polling error.
Without knowing how nonrespondents compare to respondents we cannot conclusively identify the primary
source of polling error.
- Factors that worked well in correcting for nonresponse in previous elections (including demographic
composition, partisanship, or 2016 vote) did not render accurate vote estimates for the 2020 election. Poll
data provided by some AAPOR Transparency Initiative members were reweighted to match the 2020 certified
outcome. It was necessary to increase the percentage of Republicans (or 2016 Trump voters) and decrease the
percentage of Democrats (or 2016 Clinton supporters) in the outcome-reweighted sample. In contrast, there
are only slight differences between the originally weighted poll data and the outcome-reweighted data in
terms of standard demographic categories.
- Weighting to a reasonable target for partisanship and past 2016 vote does not fully correct the polling error.
Reweighting the polls to reproduce the 2020 outcome requires a much larger margin for Trump in 2016 than
actually occurred among respondents who report voting in 2016. The larger 2016 margin for Trump among
those who reported voting for Trump in 2016 could be caused by the following: an issue with the weighting
targets, i.e., the implied vote share among 2016 voters who voted in 2020 was different from the 2016 actual
outcome; or differences in opinion within groups that responded, e.g., the 2016 Trump supporters who
responded to polls were more likely to vote for Democrats than those who did not. It is impossible to know
which caused the larger 2016 margin.
- It is possible that 2020 pre-election polls were not successful in correctly accounting for new voters who
participated in the 2020 election. There were many new voters in 2020 and it is unclear whether the
proportion of new voters in the polls matched the proportion of actual new voters. It is also unclear whether
the new voters who responded to polls had similar opinions to those who did not respond. Given the relative
proportion and self-reported voting behavior of these new voters in the data available to the Task Force, this
group of voters pushed the overall polling margins in the Democratic direction. Error in polling this group
could have produced the observed polling error.