Categories
Uncategorized

Election Update #8: The Herding Monster Rears Its Head

In the final weeks before the election, the polls have tightened. Overall, this has benefited Donald Trump: he’s gained 1-2 points on Kamala Harris in national and swing state averages, causing polls-based statistical models to cast the race as 50-50 tie or even give Trump a slight edge.

As the final data comes in, however, something looks fishy.

While it’s perfectly reasonable for polls to show an average of a tie, what we’re now seeing is that nearly every swing state poll is now showing a tie—or at most a lead of +1 for either candidate. You might assume that such consistency merely proves that the race is, in fact, tied. But since polls typically feature margins of error of 3-5 points (due to the random error inherent in sampling a population), this grouping—or “herding”—of polls into a very narrow consensus is not consistent with a mathematically reasonable distribution. In other words, if the race really were tied and pollsters were releasing their data unfiltered, we would see a smattering of poll numbers loosely clustered around zero: Harris up four, Trump up three, et cetera. Instead, polls have shown almost no variation.

The only possible statistical explanation for this is that pollsters are deliberately influencing their topline numbers to avoid going out on a limb against the consensus. For example, if their raw data were to show Trump up by four in Georgia, they would alter it so that the final, published line shows Trump up by only one, more in line with the consensus. This way, if, say, Harris wins Georgia by one, the pollster won’t be too far off, and they won’t be singled out for being “wrong.”

Thus, the herding evident in the current data casts doubt on polls’ toplines. Indeed, several pundits, such as Nate Silver, have begun raising the alarm that the data isn’t adding up. And when Silver speaks on this, we should believe him: the last time he did so with such pointedness was in 2016, when final polls had suspiciously herded around a Hillary Clinton advantage of +3 to +4. As Silver and others had warned, it turned out to be a sign that pollsters were avoiding publishing data that ran contrary to the consensus—in that case, the consensus that Clinton was the clear favorite to win.

So what might the pollsters be hiding this time? More specifically, which candidate might be benefiting from the apparent herding?

We don’t know. And it’s entirely possible that the herding is benefiting neither candidate: maybe the race really is dead even, and the pollsters are merely herding their outlying numbers from both directions toward the center, where the truth actually lies.

But, as with my previous piece on early voting data, just because we don’t know won’t stop me from guessing. And I’m guessing that the polls are herding somewhat toward Trump, making it seem as though he’s doing somewhat better than he really is.

I base this optimism (being a Harris supporter) on a few pieces of evidence.

The first is that pollsters have already made public changes to their weighting methods that favor Trump. I covered this in Update #5. To summarize what I described there: following the second consecutive underestimation of Trump in the 2020 election, pollsters adopted certain weighting strategies designed to prevent a third underestimation. In particular, most pollsters now weight by recalled vote, which notoriously underestimates the winning party in the last election.

This indicates to me that pollsters in 2024 are more concerned about underestimating Trump than overestimating him. And there’s good reason for this: their credibility is on the line. Following their leans toward Democrats in 2016 and 2020, polls have come under fire from Trump and other Republicans, who have accused them, essentially, of being Deep State operations meant to sabotage them. It’s not surprising that pollsters would want to avoid attracting these criticisms again. In fact, dynamics like these explain why political polls have never underestimated the same party in three consecutive presidential elections: pollsters make adjustments to prevent the errors of the previous cycle and naturally wind up overcompensating.

Given the apparent concern for underestimating Trump once again, which has been reflected in pollsters’ behavior so far, it seems to me more likely that herding would be undertaken to boost Trump’s support, not diminish it.

The second reason I suspect that pollsters are herding in favor of Trump and against Harris is the strange data that has been released in a few swing state polls recently, which may give us a look under the hood at what might be going on more broadly. Take the TIPP poll of Pennsylvania released in mid-October. It reported a topline among registered voters (RV) of Harris +4. But among likely voters (LV), they reported that Trump was actually ahead by a point.

How could this be? LV samples are typically only one or two points different than RV samples—not five. TIPP, to their credit, released their data, so the answer was soon uncovered on social media: they had employed a likely voter screening formula that, somehow, filtered out almost the entire city of Philadelphia. This worked as follows: the RV sample included 93 respondents from Philadelphia. But only 12 of those were counted as “likely” voters by the screening formula. Meanwhile, in other regions of the state, the formula only screened out about ten percent of voters.

Something was clearly awry. And later that week, another Pennsylvania poll was released by Franklin & Marshall. Once again, it showed Harris up by four points among registered voters—but Trump ahead by one among likely voters.

Data like this indicates to me that polls are using likely voter screens—the methods of which are private and entirely up to pollsters’ discretion—to push toplines toward Trump. This can be an effective form of herding, and it’s theoretically easy to do, especially since the Democratic coalition includes some lower-propensity voting demographics. For example, a pollster could justifiably deem a portion of its nonwhite respondents as “unlikely” voters and remove them from the sample, since nonwhite people are indeed less likely to vote than White people. Of course, the pollster would have to leave a weighted total of nonwhite voters in the sample, but they could manipulate which voters were removed, thereby overweighting more Trump-friendly nonwhite voters, for instance those who live in non-urban regions. This to me seems like the most probable explanation for the TIPP numbers.

Plus, to me, the sudden and widespread phenomenon of Trump performing better in LV data compared to RV data doesn’t pass the smell test. In most polls this year until the past two weeks, data showed Trump slipping a bit among LVs compared to RVs. It makes sense why this would be the case: as I covered in Update #6, many of Trump’s supporters profile as lower-propensity voters, lacking college educations and generally distrusting politics. Meanwhile, Democrats have soared in popularity with White people above 45 years old with college degrees, who profile as the highest propensity voters. So it makes sense to me that Democrats would be doing relatively well in LV data, as they had been doing for most of the cycle—but recent numbers have swung in the opposite direction, even including some extreme examples like the aforementioned TIPP and F&M polls in Pennsylvania.

As previously described, pollsters have an obvious motive to jigger their LV data toward Trump: they don’t want to appear permanently biased against Republicans by underestimating him for a third consecutive time. But are they overcorrecting?

The third reason I suspect a general herding pattern in Trump’s favor is the results of district-level polls over the past few weeks. These polls cover a single House of Representatives district, not a whole state (or the whole country). They’re known to be more accurate than state polls, since they sample a relatively small, homogenous area and therefore encounter fewer challenges related to regional and demographic weighting. And unlike state polls, these smaller polls have recently shown Harris equaling or exceeding Biden’s 2020 margins.

In a poll of Pennsylvania’s 8th district, for example, Trump leads Harris by 3 points according a Noble Insights poll released on October 25th. That’s the same margin by which Trump carried the district in 2020, when he lost the state by 1.2 points. So the poll represents decent news for Harris. Similarly, in a Noble poll of AZ-02, a mostly rural district that Trump carried by 10 points in 2020 en route to losing Arizona by a few tenths of a point, he currently leads by 9. Again, very similar numbers in a state that Democrats narrowly won in 2020. In a poll of PA-10, a suburban district, Harris leads Trump by 5 points—although Trump won the district in 2020 by 4 points. The recent result is similar to the margin achieved by John Fetterman when he won the district by 5 points in 2022 on the way to winning the state overall by the same margin. In summary, then, it’s great news for Democrats—and more evidence that the suburbs are ground zero for Democratic gains.

The district polls are valuable when trying to look past the effects of herding. This is because pollsters have no reason to herd them. After all, 1) there’s no public scrutiny of district-level polls and 2) there’s no consensus to herd toward. So Harris’s apparent overperformance in district polls compared to state and national polls could indicate that the latter numbers are being nudged toward Trump by skittish pollsters hoping to avoid MAGA’s wrath when the dust settles in November. Some have pointed out that in 2016, district polls began showing warning signs for Clinton in September and October. These turned out to be harbingers of an eventual polling miss that was likely due in part to pollsters herding numbers in Clinton’s favor, wanting to avoid publishing consensus-defying, pro-Trump data.

I’ve laid out my reasons for interpreting the 2024 herding phenomenon as an indication that pollsters may be slightly underestimating Harris in crucial battlegrounds. There’s no direct proof of this, of course, and my interpretation might reflect bias toward her candidacy. But for Harris supporters looking for a bit of optimism as pundits proclaim the race a tossup, this is the optimist’s case.

 

–Jim Andersen