I don’t mean to turn by blog into political nonsense central, but I just can’t pass on some of these insane arguments.

This morning, the state of Texas sued four other states to overturn the results of our presidential election. As part of their suit, they included an "expert analysis" that claims that the odds of the election results in the state of Georgia being legitimate are worse that 1 in one quadrillion. So naturally, I had to take a look.

Here’s the meat of the argument that they’re claiming "proves" that the election results were fraudulent.

I tested the hypothesis that the performance of the two Democrat candidates were statistically similar by comparing Clinton to Biden. I use a Z-statistic or score, which measures the number of standard deviations the observation is above the mean value of the comparison being made. I compare the total votes of each candidate, in two elections and test the hypothesis that other things being the same they would have an equal number of votes.

I estimate the variance by multiplying the mean times the probability of the candidate not getting a vote. The hypothesis is tested using a Z-score which is the difference between the two candidates’ mean values divided by the square root of the sum of their respective variances. I use the calculated Z-score to determine the p-value, which is the probability of finding a test result at least as extreme as the actual results observed. First, I determine the Z-score comparing the number of votes Clinton received in 2016 to the number of votes Biden received in 2020. The Z-score is 396.3. This value corresponds to a confidence that I can reject the hypothesis many times more than one in a quadrillion times that the two outcomes were similar.

This is, to put it mildly, a truly trashy argument. I’d be incredibly ashamed if a high school student taking statistics turned this in.

What’s going on here?

Well, to start, this is deliberately bad writing. It’s using a whole lot of repetitive words in confusing ways in order to make it sound complicated and scientific.

I can simplify the writing, and that will make it very clear what’s going on.

I tested the hypothesis that the performance of the two Democrat candidates were statistically similar by comparing Clinton to Biden. I started by assuming that the population of eligible voters, the rates at which they cast votes, and their voting preferences, were identical for the two elections. I further assumed that the counted votes were a valid random sampling of the total population of voters. Then I computed the probability that in two identical populations, a random sampling could produce results as different as the observed results of the two elections.

As you can see from the rewrite, the "analysis" assumes that the voting population is unchanged, and the preferences of the voters are unchanged. He assumes that the only thing that changed is the specific sampling of voters from the population of eligible voters – and in both elections, he assumes that the set of people who actually vote is a valid random sample of that population.

In other words, if you assume that:

- No one ever changes their mind and votes for different parties candidates in two sequential elections;
- The population and its preferences never changes – people don’t move in and out of the state, and new people don’t register to vote;
- The specific people who vote in an election is completely random.

Then you can say that this election result is impossible and clearly indicates fraud.

The problem is, none of those assumptions are anywhere close to correct or reasonable. We know that people’s voting preference change. We know that the voting population changes. We know that who turns out to vote changes. None of these things are fixed constants – and any analysis that assumes any of these things is nothing but garbage.

But I’m going to zoom in a bit on one of those: the one about the set of voters being a random sample.

When it comes to statistics, the selection of a sample is one of the most important, fundamental concerns. If your sample isn’t random, then it’s not random. You can’t compare results for two samples as if they’re equivalent if they aren’t equivalent.

Elections aren’t random statistical samples of the population. They’re not even *intended* to be random statistical samples. They’re deliberately performed as counts of motivated individual who choose to come out and cast their votes. In statistical terms, they’re a self-selected, motivated sample. Self-selected samples are neither random nor representative in a statistical sense. There’s nothing wrong with that: an election isn’t intended to be a random sample. But it does mean that when you do statistical analysis, you *cannot* treat the set of voters as a random sampling of the population of elegible voters; and you cannot make any assumptions about uniformity when you’re comparing the results of two different elections.

If you could – if the set of voters was a valid random statistical sample of an unchanging population of eligible voters, then there’d be no reason to even *have* elections on an ongoing basis. Just have one election, take its results as the eternal truth, and just assume that every election in the future would be exactly the same!

But that’s not how it works. And the people behind this lawsuit, and particularly the "expert" who wrote this so-called statistical analysis, know that. This analysis is pure garbage, put together to deceive. They’re hoping to fool someone into believing that they actually prove something that they couldn’t prove.

And that’s despicable.