Categories
Uncategorized

Election Update #5: Does Trump Always Beat the Polls?

It’s time to talk about polls. Currently, most polling averages have Kamala Harris leading Donald Trump by 1-2 points in Michigan, Wisconsin, and Pennsylvania—three states that would probably combine to deliver her an electoral college victory. Most supporters of hers, though, won’t be very reassured by this, thanks to traumatizing memories of polling misfires in 2016 and 2020 that underestimated Trump’s eventual performance. Won’t the polls underestimate Trump once again? Doesn’t he always beat the polls?

Well, let’s dig deeper. First, let’s revisit 2016 (an admittedly unpleasant exercise). Barack Obama had just won two consecutive general elections, including outperforming the polls in 2012 to decisively beat a strong opponent, Mitt Romney. His sky-high margins with voters of color and steady performance with working class Whites seemed to represent an unbeatable coalition, promising endless Democratic victories into the future. Furthering this perception, the Republican party appeared in utter disrepair, having failed to prevent the nomination of a rambling, braggadocios billionaire with no political expertise, an atrocious record of scandals, a baffling soft spot for Vladimir Putin, and a penchant for melting down on Twitter in response to the slightest political criticism. Because of all this, a persistent media narrative emerged that Hillary Clinton was primed for a breezy victory.

The problem was that the polling numbers didn’t quite support this. In reality, the final national polls showed Clinton ahead of Trump by only 3.5 points on average—hardly a forecast of a blowout. In addition, Trump had made clear gains over the final two weeks, suggesting that, in the final days after all polls had concluded, he would stand to gain even further. Ultimately, Clinton won the national popular vote by 2.1 points, only a 1.4 points below what the last polls had reflected. But because the media had complacently ignored the warning signs for her in the data, a narrative emerged after the election that the polls had missed terribly, fooling everybody. So 2016 offers primarily a lesson in media unreliability, not poll uncertainty.

Despite this, you may correctly point out that a 1.4-point underestimation of Trump isn’t nothing, and the miss was larger in the crucial Midwest swing states that enabled his electoral college victory. What accounted for this miss?

The answer has to do with the fact that all polls “weight” their samples to ensure that their panels accurately reflect the true composition of the electorate. For instance, polls will apply weights to ensure the proper percentages of Black, Hispanic, and White respondents. This prevents high response rates by one group from dominating the numbers and skewing the poll. However, one trait that had not been weighted traditionally was college education. This was because it simply hadn’t been necessary: a college degree hadn’t been a major predictor of voting habits, so there was no reason to ensure proper percentages of respondents with and without degrees.

Trump’s populist campaign in 2016, though, changed that. Unlike prior Republican candidates (like Romney), much of his messaging appealed to less educated voters while repelling more educated voters. This caught pollsters flat-footed. Since they hadn’t weighted their samples for college education, their numbers were skewed toward college-educated voters, who are more likely to answer calls from pollsters. These highly educated voters disproportionately favored Clinton. Therefore, the polls’ results underestimated Trump, especially in states with large populations of non-college Whites who hadn’t been adequately represented in the data.

Pollsters realized their mistake and in 2020 duly corrected it by weighting all polls for college education. But here’s where things get strange. Because in 2020, the polling miss was actually larger than in 2016: averages generally reflected a national lead for Joe Biden of a healthy 8.5 points, but he ultimately won the popular vote by only 4.5 points, barely enough to win the electoral college.

What gives? The problem this time was more mysterious. As Nate Cohn of The New York Times wrote, the polls’ clear bias toward Biden even with proper weights indicated a much more fundamental problem with polling itself:

This is a deeper kind of error than ones from 2016. It suggests a fundamental mismeasurement of the attitudes of a large demographic group, not just an underestimate of its share of the electorate. Put differently, the underlying raw survey data got worse over the last four years, canceling out the changes that pollsters made to address what went wrong in 2016.

Essentially, although college education had now been weighted properly, a new problem had arisen: the types of voters who responded to polls were more likely to support Biden regardless of education or of any other demographic trait. So the polling practices were fine, but the data itself was flawed because the population was responding to polls in an asymmetric manner favoring Democrats.

There are several reasonable theories for why this would happen. Chief among them, once again, is Trump. He has fomented a blanketlike paranoia among his supporters, actively encouraging distrust of ostensibly neutral institutions like science, medicine, intelligence agencies, the judiciary, public schools, and even the post office. It’s hardly surprising, then, that his supporters would refuse to take calls from pollsters. Why volunteer personal information to a stranger who could potentially be a covert agent of the Deep State? That’s partly a joking hypothetical—but only partly.

Another potential reason for the 2020 polling misfire: the COVID-19 pandemic. As David Shor, a Democratic polling scientist, points out, in late 2019 Biden had a polling lead of 4-5 points, indicating a narrow electoral college advantage over Trump. But after lockdowns swept the nation in spring of 2020, Biden’s polling lead ballooned to 8-10 points, where it stayed until the election. Shor believes that, in retrospect, Biden hadn’t actually gained supporters over that timeframe. Instead, the pandemic had caused Democratic voters to be both more motivated and more available to take polls:

The basic story is that after lockdown, Democrats just started taking surveys, because they were locked at home and didn’t have anything else to do. Nearly all of the national polling error can be explained by the post-COVID jump in response rates among Dems.

Some may recall “The Resistance,” the groundswell of anti-Trump activism that reached a crescendo during Trump’s erratic handling of the pandemic. Shor is arguing, essentially, something unsurprising: that the resistors really liked being polled.

That brings us to 2024. The pandemic is no longer at hand, so the primary concern for pollsters carrying over from 2020 is the first dynamic I mentioned: the potential difficulty reaching Trump supporters who, almost by definition, don’t trust the outlets that aim to poll them. Pollsters have, to their credit, attempted to resolve this in various ways.

The first, applicable especially to state polling, is to weight by region. After all, it’s one thing to reach non-college White voters, it’s another to reach non-college Whites from rural regions of a given state. The latter will probably prove more fruitful in reaching Trump’s biggest fans.

Another is that some pollsters have begun counting respondents’ vote preference even if they fail to complete the entire poll. This change was made, apparently, because many Trump supporters would simply cut the pollster off early in the call, exclaim that they were voting for Trump, and hang up. Under previous guidelines, this wouldn’t have counted as a response. Now, pollsters, at least including the New York Times, have counted this as a Trump vote in the data. Theoretically, this could mitigate the problem of the “paranoid Trump voter,” proving more inclusive to respondents who may not be willing to volunteer information over the course of a multi-question poll.

The most influential change, though, may be the newfound predominance of weighting by recalled 2020 vote. This practice entails asking participants whom they voted for in 2020 and applying weights to ensure that these percentages match the actual 2020 electorate. It might seem logical to do this, but such methodology is actually notoriously unreliable. Cohn has written about this extensively, but the most relevant problem is that, for some reason, a small percentage of poll respondents inevitably report that they voted for the winner of the last election when they actually voted for the loser. This causes an oversampling of those who voted for the losing party because some of them have essentially disguised themselves in the poll as voters for the winning party. This in turn causes the false appearance of voters shifting their preference toward the previous loser, when in fact many of these voters actually voted for the loser the first time, too. They just don’t want to admit it.

Cohn writes that nearly two-thirds of pollsters now weight by recalled vote, as opposed to less than ten percent in 2020. This would theoretically create a systemic polling bias toward Trump, and indeed, as Cohn shows in the table below, pollsters who don’t use recalled vote weighting have generally found a slightly more comfortable edge for Harris in the crucial Midwest states:

Furthermore, he speculates that this pro-Trump bias is the very point of using this weighting strategy: pollsters who obtain very left-leaning data want to avoid underestimating Trump drastically, as they did in 2020, so they weight by recalled vote to counter what they perceive as an inevitable anti-Trump bias. This is why, in the chart above, some results, especially the national average, are actually better for Trump without the recalled vote weighting: it’s because the pollsters with the most left-leaning data (often online national polls) disproportionately use the strategy. Their numbers start out very heavily Democratic, so they intentionally skew them back toward Trump because they don’t trust their own results.

To me, this all means that we’re in very uncertain waters regarding the real status of this election. On one hand, optimists for Harris could point out that pollsters are essentially handing points to Trump by adopting a debunked weighting practice that favors the last election’s losing party. On the other hand, Trump supporters could, with reasonable evidence, argue that polling itself is now heavily biased against Trump due to his supporters’ aversion to being surveyed—a bias that the new weighting methods could never hope to erase. (Recall the very large 4-point national miss in 2020.)

The big question, then, may be how much the COVID-19 pandemic truly impacted the 2020 polling. If the 4-point miss in 2020 was indeed mostly attributable to polling distrust among Trump’s supporters, then Trump is probably in good shape, since nothing has occurred to alter that dynamic. But if, as Shor theorizes, the 2020 miss was largely because the unique environment of the pandemic caused Trump-hating voters to disproportionately pervade polling panels, then Harris may actually be in significantly better shape than the polling indicates. After all, if pollsters have now adopted pro-Trump weighting methods only because he outperformed their numbers in 2020—but if that overperformance was actually due to 2020-specific factors that no longer apply—then we may see Harris, not Trump, outperform the polls this time.

Remember that in the 2022 midterms, with the pandemic effectively over and several Trumpian candidates running for office in swing states, the polls slightly underestimated…Democrats.

 

–Jim Andersen