Excess COVID-19 Deaths

Have 200K+ Americans really died from COVID?

TL;DR, Is the 200K+ deaths figure genuine? The answer is most likely ”YES, and it might be an underestimate”, but keep reading for a few interesting statistical and visualization techniques that would tell you exactly why we can confirm this.

One of the COVID-19 talking points that’s been making the rounds, ranging from baseless conspiracy accusations to genuine confusion, has been regarding the number of “true” deaths from COVID-19. Most of this is in reference to data from the CDC, which mentions 6% of the total COVID-19 deaths having no other comorbidities in the autopsy reports. What many people have been taking away from this statistic is that the true number of deaths, the people that died that wouldn’t have otherwise died, is much lower (in come cases being interpolated as less than 12,000 deaths instead of ~200,000 in the United States). It’s been such a common talking point among both social media and the news cycles that you’d have to be actively avoiding COVID-19 news to have not come across it by now. In the early days of the first wave, the differences between death rates in different age groups was a common argument for why the fear of infection (and by extension the lockdowns and social distancing rules) was overblown. Now this is coupled with this new assertion that 96% of these deaths would have happened anyways, or that they were mainly due to other causes.

“And the stimulus bill that was intended to help with the hospitals that were being overwrought with Covid patients created an incentive to record something as Covid that is difficult to say no to, especially if your hospital’s going bankrupt for lack of other patients. So, the hospitals are in a bind right now. There’s a bunch of hospitals, they’re furloughing doctors, as you were mentioning. If your hospital’s half full, it’s hard to make ends meet. So now you’ve got like, “If I just check this box, I get 8,000 dollars. Put them on a ventilator for five minutes, I get 39,000 dollars back. Or, I got to fire some doctors.” So, this is tough moral quandary. It’s like, what are you going to do? That’s the situation we have.”

Elon Musk on Episode #1470 of the Joe Rogan Podcast, May 7th, 2020

And now the recent data is seen as boosting this argument further. Health authorities have responded with various retorts to this line of reasoning, though this hasn’t really moved the needle in the sphere of national discussion as much as you’d think.

As famously productive and worthwhile as Twitter threads and YouTube comment sections are for rational and reasoned debate, there’s a simpler way of approaching this. Instead of diving into the weeds of defining “comorbidity”, there’s ultimately one question we want to answer: Are more people currently dying in the United States than would have normally died?

How does the current all-cause death rate compare to the usual all-cause death rate for Americans?

Fortunately, the CDC’s data for the total number of deaths is readily available. This is data straight from the National Center for Health Statistics (NCHS), the organization officially in charge of aggregating data like death reports and compiling them into verified statistics. If you die in the United States, these are the people the attending physician will report the death certificate to. If there truly are an abnormal number of americans dying, we should see a spike in the counts of deaths. If not, we shouldn’t see anything too out of the ordinary compared to previous years. As a reminder, te’re not talking about the data on deaths from COVID, we’re interested in deaths across all Americans, regardless of age, sex, ethnicity, or cause of death. We’re looking for every instance of an American kicking the bucket, whether it be caused by heart disease, cancer, homocide, being crushed by a vending machine, or anything else that could make somebody join the silent majority.

Below is a code snippet any python programmer can use to get data from the data.cdc.gov API in JSON format. The data is free to the public, though you might start to run into connectivity issues if you’re making thousands of requests without an account.

!pip -qq install pandas
!pip -qq install sodapy
!pip -qq install backports-datetime-fromisoformat

from datetime import date, datetime, time
import random

from backports.datetime_fromisoformat import MonkeyPatch
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
from sodapy import Socrata

%config InlineBackend.figure_format = 'retina'
mpl.rcParams["savefig.format"] = 'svg'

# The fromisoformat() method is not available in Python >3.7
# This 3rd-party package fixes this
MonkeyPatch.patch_fromisoformat()

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cdc.gov", None)

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("y5bj-9g5w", limit=175000)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

results_df["number_of_deaths"] = pd.to_numeric(results_df["number_of_deaths"])

is_whole_us = results_df['state_abbreviation']=='US'
results_df = results_df[is_whole_us]
results_df

This will return is a dataframe that shows the number of deaths per week for each age group, which we’ve gone ahead and filtered for the entire United states (we’re not interested in state-by-state breakdowns for now). We’ll do one last check of the unique column values just to be sure.

print(results_df['state_abbreviation'].unique())
print(results_df['year'].unique())
print(results_df['age_group'].unique())
print(results_df['time_period'].unique())
print(results_df['type'].unique())
print(results_df['week'].unique())
week_ending_dates = results_df['week_ending_date'].unique()
print(results_df['note'].unique())
['US']
['2015' '2016' '2017' '2018' '2019' '2020']
['25-44 years' '45-64 years' '65-74 years' '75-84 years'
 '85 years and older' 'Under 25 years']
['2015-2019' '2020']
['Predicted (weighted)' 'Unweighted']
['1' '2' '3' '4' '5' '6' '7' '8' '9' '10' '11' '12' '13' '14' '15' '16'
 '17' '18' '19' '20' '21' '22' '23' '24' '25' '26' '27' '28' '29' '30'
 '31' '32' '33' '34' '35' '36' '37' '38' '39' '40' '41' '42' '43' '44'
 '45' '46' '47' '48' '49' '50' '51' '52']
[nan
 'Data in recent weeks are incomplete. Only 60% of death records are submitted to NCHS within 10 days of the date of death, and completeness varies by jurisdiction.']

We’ve correctly settled on data for the full US (not just individual states), for data across 5 years, along with a handy identifier of data from before and during the pandemic. It’s not all rosy, though. As you can see by the notes, there’s about a 10-day delay in the submission of death records to the NCHS. This means that the most recent two weeks of data will be unreliable (thus we will drop these before the visualization step).

For now, our data is still split among age groups and we want to aggregate across all ages. We want to know the total number of deaths across all Americans for any given week.

for week in week_ending_dates:
    week_cohort = results_df.query('week_ending_date=="{}" & type=="Unweighted" '.format(week))
    total_weekly_deaths = week_cohort.sum(axis=0)['number_of_deaths']
    year = week_cohort.mode(axis=0)['year'][0]
    week_number = week_cohort.mode(axis=0)['week'][0]
    time_period = week_cohort.mode(axis=0)['time_period'][0]
    new_row = {'jurisdiction': 'United States',
               'week_ending_date': week,
               'state_abbreviation': 'US',
               'year': year,
               'week': week_number,
               'age_group': 'All',
               'number_of_deaths': total_weekly_deaths,
               'time_period': time_period,
               'type':'Unweighted',
               }
            
    #append row to the dataframe
    results_df = results_df.append(new_row, ignore_index=True)

# If you want to look for the deaths for specific age groups, just
# change the "All" to one of the age group categories in the list of
# unique values in the printout above
results_df = results_df.query('age_group=="All" ')
results_df = results_df.reset_index()
results_df = results_df.drop(['index'], axis=1)

Now that we have our data, it’s just a simple matter of plotting. Fortnately for us, the design team at FiveThirtyEight already did more of the heavy lifting on the style guide.

week_endings = [datetime.fromisoformat(x) for x in results_df['week_ending_date'].to_list()]

plt.style.use('fivethirtyeight')

fig, ax = plt.subplots(figsize=(30, 10))

x = week_endings[:-2]
y = results_df['number_of_deaths'].to_numpy()[:-2]

ax.bar(x, y, width=7)
ax.set_title('Number of Deaths per week in the United States (all causes, all ages)')
ax.set_ylabel('Number of deaths (all causes)')
ax.set_xlabel('date (week)')
ax.xaxis_date()

plt.savefig('total_deaths.svg')
plt.show()

The total number of people dying in the United States, every week (save for the most recent 2 incomplete weeks)

We can see that looking for spikes in the data isn’t going to be so simple, as there’s immediately obvious seasonal spikes in the numbers of deaths. We want to know which of these spikes or valleys shouldn’t be there. We already have a cumulative number of deaths per week across all ages, but we also want the typical number of deaths per given week of the year (not just the deviation from the all-time average) to compare against. For example, if we have the number of deaths for week 17 out of 52 for 2020, we’d like to know how this compares to the average number of deaths for week 17 out of 52 for 2015-2019. This should allow us to distinguish unusually high/low numbers of deaths from the seasonal variability.

The following script gets those weekly averages and plots the differences between those averages and the total number of deaths.

weekly_averages = {}
for week in week_ending_dates:
    week_cohort = results_df.query('week_ending_date=="{}" '.format(week))
    week_number = week_cohort.mode(axis=0)['week'][0]
    typical_week_cohort = results_df.query('week=="{}" & time_period=="2015-2019" '.format(week_number))
    typical_week_deaths = typical_week_cohort.mean(axis=0)['number_of_deaths']
    weekly_averages.update({int(week_number): typical_week_deaths})


typical_deaths = results_df['week'].apply(lambda x: weekly_averages[int(x)]).to_numpy()

plt.style.use('fivethirtyeight')

# plot it
fig, ax = plt.subplots(figsize=(30, 10))

x = week_endings[:-2]
y = results_df['number_of_deaths'].to_numpy()[:-2] - typical_deaths[:-2]

ax.bar(x, y, width=7)
ax.set_title('Deviation from 2015-2019 Week-average Number of Deaths in the United States (all causes, all ages)')
ax.set_ylabel('Deviation from mean number of deaths for week $N$ (for 2015-2019)')
ax.set_xlabel('date (week)')
ax.xaxis_date()

plt.savefig('death_deviation.svg')
plt.show()

Deviation in the total number of deaths for typical week N of the year

There’s definitely spikes in earlier years resulting from nasty flu seasons, but 2020 is clearly experiencing something much worse than a typical bad flu. We could go further by doing additional statistical tests on the data, but this seems like the kind of graph where a career pp-value-hacker would say, “Nah, that’s not necessary for this one.”

In fact, adding up the bars from just 2020 (including the negative ones) would suggest that around 274,705 additional Americans have died that wouldn’t have otherwise. In fact, it’s highly possible that the current esimates of deaths from COVID might be an under-estimate.

Some of us might be living in sparesly populated areas that might not have been seeing much of the effects of the pandemic beyond all the businesses in town suddenly requiring masks. My experience? I was in Brooklyn earlier this year when the hospital 3 blocks down the street from my apartment started having a lot more refrigerated trucks in front of it (being used as mobile morgues after the hospital’s own started to overflow). While it’s true that many of the “mass grave” pictures on social media have been coming from places like Hart Island (already a place for burial of those with no next of kin, and not the only destination for those that have died of COVID), there’s definitely been a dramatic uptick in the number of burials.

“How can we be sure these numbers are trustworthy?”

Unclear autopsy results are one thing, but excess deaths on this scale are another. One can make a case for incentives by hospitals to put down a specific cause of death. It’s a lot harder to believe in the falsification of tens of thousands of death certificates for people that have not actually died (like some inexplicably scaled-up reverse-Weekend-at-Bernie’s plot). Don’t get me started on what could possibly motivate a conspiracy like that, or how such a plan would even be coordinated for months across hundreds of thousands of hospital staff without anybody else catching on).

But criticisms aside, how would we even test whether official figures are accurate (or whether they’re being pulled out of thin air)? As it turns out, there are a lot of statistical techniques for identifying real data from fudged data. Many of the techniques in this space draw upon the fact that humans are pretty bad at generating numbers from a probability distribution. Humans will often favor numbers ending in 5 or 0, or add in digits with abnormal frequency because they might seem “more random” than others.

COVID death data from US, Canada, China, and France doesn’t stray too far from the expected distribution of trailing digits. For a negative example, Boris Ovchinnikov used this kind of digit analysis to great effect to spot such unusual patterns in Russia’s official COVID-19 statistics. For example, he found that the reported numbers of new cases officially ended in the digits 99 four times in a timespan of 25 days. He calculated that the likelihood of this happening on it’s own randomly is about 1.1%. This was made even more dubious by the fact that the trend in the number of cases followed a linear pattern rather than an exponential pattern.

What does it look like if we apply techniques like those used by Boris Ovchinnikov to the US death counts? Are there any discrepancies between the digit preference patterns before and during the pandemic?

import pandas
from collections import Counter

client = Socrata("data.cdc.gov", None)
results = client.get("y5bj-9g5w", limit=175000)
results_df = pd.DataFrame.from_records(results)
results_pre_df = results_df.query('time_period=="2015-2019" & type=="Unweighted" ')
results_post_df = results_df.query('time_period=="2020" & type=="Unweighted" ')

death_pre_list = results_pre_df['number_of_deaths'].tolist()
death_post_list = results_post_df['number_of_deaths'].tolist()
leading_pre_digits = [str(x)[-2:] for x in death_pre_list]
leading_post_digits = [str(x)[-2:] for x in death_post_list]

digit_pre_counts = Counter(leading_pre_digits)
digit_post_counts = Counter(leading_post_digits)
x = [] # Last leading digit (index -1)
y = [] # 2nd to last digit (index -2)

# The list of digit frequencies before and during the pandemic
pre = []
post = []

for i in range(0, 10):
    for j in range(0, 10):
        x.append(j)
        y.append(i)
        pre.append(digit_pre_counts[str(i) + str(j)])
        post.append(digit_post_counts[str(i) + str(j)])

# We will normalize the digit counts as fractions of the total
# (we multiply by 10,000 because otherwise our scatterplot dots will be too
# small to see).
pre_sum, post_sum = sum(pre), sum(post)
pre = [(x / pre_sum)*10000 for x in pre]
post = [(x / post_sum)*10000 for x in post]

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(30, 10))

ax1.scatter(x,y,s=pre)
ax1.set_title("Relative Digit Preference in Weekly Death Totals; 2015-2019\n(pixel-widths of dots correspond to normalized frequency)")
ax1.xaxis.set_ticks(np.arange(0, 10, 1))
ax1.yaxis.set_ticks(np.arange(0, 10, 1))
ax1.set_xlabel('ones-place digit (xxxX)')
ax1.set_ylabel('tens-place digit (xxXx)')

ax2.scatter(x,y,s=post)
ax2.set_title("Relative Digit Preference in Weekly Death Totals; 2020\n(pixel-widths of dots correspond to normalized frequency)")
ax2.xaxis.set_ticks(np.arange(0, 10, 1))
ax2.yaxis.set_ticks(np.arange(0, 10, 1))
ax2.set_xlabel('ones-place digit (xxxX)')
ax2.set_ylabel('tens-place digit (xxXx)')

plt.savefig('digit_preference.svg')
plt.show()

No noticeable differences between distributions of digits, and the data is practically uniform compared to Ovchinnikov’s analysis of Russia’s data

…nope. Nothing really out of the ordinary here, especially not compared to the dramatic spikes and dips in Ovchinnikov’s visualization of the Russian Data. In fact, if you were to crop the tick-marks for the 0s, both also seem like they’d be close to a 2-dimensional Newcomb-Benford distribution (i.e., the distribution of non-zero digits we would expect if these numbers were occuring naturally).

“I still don’t buy it. Can’t you do some more rigorous statistical tests?”

Many of you that have read this far be unsatisfied with just eyeballing some nice matplotlib plots. After all, as one Twitter message I got put it, “Maybe the CDC is being sneaker than the Russians!“.

🤦

Regardless, many of the less paranoid among you will still want actual numbers detailing the likelihoods of this data being fudged (you may already be convinced, but you still want to persuade others). Can we really test for and measure data manipulation like this? We absolutely can. There are many possible approaches we could take, and I’ve seen several of the following “goodness of fit” tests used in digit analysis:

  • Chi-squared test: This is one of the more common “goodness of fit” tests out there. That being said, our data is not really split into bins, our null hypothesis wouldn’t actually follow a normal distribution for those bins even if we did, and with as many degrees of freedom as we have in both our expected and observed classes, we might not actually have enough samples for our observed cases. Therefore, the Chi-squared test is inappropriate for solving this problem.
  • Kolmogorov-Smirnov test: This can be a good test for determining if our data fits a normal distribution. Rather than determining whether particular samples came from a Normal distribution, it gives you the likelihood of whether there’s a normal distribution overall. Our main issue with this test is that we’re not looking for a normal distribtuion. Therefore, the Kolmogorov-Smirnov test is inappropriate for solving this problem.
  • Anderson-Darling test: Unlike Kolmogorov-Smirnov, this test focuses more on distributions that are more sensitive to deviations in their tails (like a Newcomb-Benford distribution). Like Kolmogorov-Smirnov, it still focuses on the probability of there being a Normal distribution (again, we are not looking for that). Therefore, the Anderson-Darling test is inappropriate for solving this problem.
  • Shapiro-Wilk test: This test calculates a WW value that will tell you if a random sample came from a normally distributed population. The test is recommended for samples up to n=2000n=2000. Not only do we have more than 2000 samples, this is still a Normal-distribution-centric test. Therefore (say it with me this time), the Shapiro-Wilk test is inappropriate for solving this problem.

As you’ve probably observed, none of these tests are really appropriate for digit analysis. I bring this up because I’ve seen too many people use them in digit analysis, and then automatically determine that their results are significant/not-significant based on using the wrong test (that’s even worse than p-value hacking).

Instead, we want to use a test based on the Newcomb-Benford distribtuion. This is actually the same kind of distribution Ben Affleck’s character in The Accountant used, believe it or not.

I would argue that the scenes in this movie of getting fraud detection done by hand (without a python script, no less) are more badass than the action scenes.

This is a probability distribution applied to distributions of naturally-occuring digits. The reasoning is that not all digits are expected to be used equally. This applies to many kinds of numbers, not just the buttons on your microwave. At the risk of grossly oversimplifying this, we can explain this by the fact that the mantissas of the logs of exponentially-distributed numbers are typically uniformly distributed. What this means is that across orders of magnitudes of numbers, the positive fractional parts of numbers are uniformly distributed (if we have e=2.71828...e=2.71828..., the positive fractional part is the 0.71828...0.71828... If we have an arbitrary base-10 number xx, with aa being the integer part and bb being the mantissa of log10x\log_{10} x, we can write this logarithm as

log10x=log10(10ab)=a+log10x, where 1b<10\log_{10} x = \log_{10} (10^a \cdot b) = a + \log_{10} x \text{, where } 1 \leq b < 10

This might seem complex at first, but aside from mantissas, we haven’t really used any math newer than what you learned about laws of logarithms in middle school. From this middle-school-era-math logarithm proof, we can get the following probability distribution for our first digits:

P(d)=log10(d+1)log10d=log10(1+1d), where d=1,2,...,9P(d) = \log_{10} (d+1) - \log_{10} d= \log_{10} (1 + \frac{1}{d}) \text{, where } d = 1, 2, ..., 9

Simplifying this even further, we’d expect the distributions of first digits to look like the following (dd is the first digit from 1 to 9, with P(d)P(d) being the probability of sampling digit dd):

dd 1 2 3 4 5 6 7 8 9
P(d)P(d) 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%

All we really need to do to figure out if something’s out of the ordinary is to measure the euclidean distance between the predicted and actual values (sometimes given the unnecessarily name of the dd*-factor (dd*)). We can calulate this fairly simply by

d=d=19(P~(d)P(d))21.03606d* = \sqrt{\frac{\sum_{d=1}^{9} (\tilde P(d) - P(d))^2}{1.03606}}

where dd is the aforementioned digit, 1.036061.03606 is the maximum distance (used for normalization), and P~(d)\tilde P(d) is the probability of encountering the digit in real-world data. If a dataset follows a Newcomb-Benford distribtuion exactly, the dd* will be 0.0. If the distance is larger, dd* will be closer to 1.0.

Unfortunately, when it somes to defining a cutoff where we can say “yes, there’s definitely been data manipulation”, that’s not quite an exact science. W Goodman suggested that 0.250.25 is a good-enough cutoff. If your reaction to reading this is that it’s a pretty flippant way of creating a decision boundary, let me remind you that this isn’t too far off from how 0.050.05 was settled on as a significance cutoff for many statistical tests because Ronald Fisher said in 1925, “Eh, 1/20 sounds good enough” (paraphrasing a lot here, but that’s what he meant).

We’ll start our test by creating normalized counts of our digits from 1 to 9:

expected_df = pd.DataFrame(np.asarray([int((x / 10000) * pre_sum) for x in pre]).reshape((10, 10)))
observed_df = pd.DataFrame(np.asarray([int((x / 10000) * post_sum) for x in post]).reshape((10, 10)))
post_2020_sum = observed_df.sum(axis=1)[1:].sum()
pre_2020_sum = expected_df.sum(axis=1)[1:].sum()
post_2020_freqs = observed_df.sum(axis=1)[1:].div(post_2020_sum).values
pre_2020_freqs = expected_df.sum(axis=1)[1:].div(pre_2020_sum).values

By default, we can calculate a perfect set of Newcomb-Benford-based expected values from scratch each time for comparison (it’s only 9 digits, no more or less, so that O(N)O(N) calculation becomes O(1)O(1) compared to our more varied inputs). However, we’ve also added the option of comparing the observed digit distribution to other digit distributions (e.g., the digit distribution of the previous 4 years).

Why do we put this option in? As useful as Benford’s law is, there are plenty of digit distributions that are known to disobey it.

For example, in the 1960 and 1970 censuses, in populations above 2500 from 5 US states, only 19% began with digit 1 but 20% began with digit 2. This happened not because of any intentional fraud, but because data for populations under 2500 happened to be truncated.

What other kinds of distributions naturally disobey Benford’s law? Pathology reports, as it turns out. Plenty of rounding goes on in pathology reports, so a similar source of bias is introduced. This is why we introduce two options for our expected digit frequencies: Our digit distribution may not follow Benford’s law exactly, but as an extra measure we can also compare it to digit frequencies from before the Pandemic.

def benford_d_test(observed_frequencies, expected_frequencies=[]):
    """Calculates a Benford d-star test for digit frequencies

    Args:
        observed_frequencies (list): list of observed digit frequencies (must be
          equal to the length of the list of expected frequencies, for which the
          default length is 9)
        expected_frequencies (list): List of expected frequencies. Default
          is an empty list. If this list is empty, a frequency list based on
          a Newcomb-Benford distribtuion will be calculated exactly from
          scratch

    Returns:
        numpy.float64: Euclidean distance of observed frequencies from
          expected frequencies.
    """
    if expected_frequencies:
        pass
    else:
        expected_frequencies = [np.log10(1 + 1 / digit) for digit in np.arange(1, 10)]
    assert (len(observed_frequencies) == len(expected_frequencies)), "Number of observed frequencies must equal expected frequencies"
    d_star_sum = 0.0
    for i in range(0, len(expected_frequencies)):
        d_star_sum = (d_star_sum + (observed_frequencies[i] - expected_frequencies[i]) ** 2)

    d_star_sum /= 1.03606
    d_star_sum = np.sqrt(d_star_sum)

    return d_star_sum

Do how do all these comparisons pan out?

print("d* for 2020 1st digits on Newcomb-Benford distribtuion: \t\t{:.3f}".format(benford_d_test(post_2020_freqs.tolist())))
print("d* for 2020 1st digits on uniform distribution: \t\t\t{:.3f}".format(benford_d_test(post_2020_freqs.tolist(), [1./9] * 9)))
print("d* for 2015-2019 1st digits on Newcomb-Benford distribtuion: \t{:.3f}".format(benford_d_test(pre_2020_freqs.tolist())))
print("d* for 2015-2019 1st digits on uniform distribution: \t\t{:.3f}".format(benford_d_test(pre_2020_freqs.tolist(), [1./9] * 9)))
print("d* for 2020 1st digits on 2015-2019 digit distribution: \t\t{:.3f}".format(benford_d_test(post_2020_freqs.tolist(), pre_2020_freqs.tolist())))
print("d* for 2015-2019 1st digits on 2020 digit distribution: \t\t{:.3f}".format(benford_d_test(pre_2020_freqs.tolist(), post_2020_freqs.tolist())))
d* for 2020 1st digits on Newcomb-Benford distribtuion: 		0.165
d* for 2020 1st digits on uniform distribution: 			0.080
d* for 2015-2019 1st digits on Newcomb-Benford distribtuion: 	0.160
d* for 2015-2019 1st digits on uniform distribution: 		0.097
d* for 2020 1st digits on 2015-2019 digit distribution: 		0.024
d* for 2015-2019 1st digits on 2020 digit distribution: 		0.024

Even with the Benford’s-law-deviating bias, our euclidean distances for both the 2020 digits and pre-2020 digits are less than the arbitrary data manipulation cutoff. Compare this to Benford’s law dd* values like 0.30 for Russia’s COVID-specific death data.

Another problem with the Russian data was that there were low distance scores uniform digit distribution). This was in part due to data ending in 2 and 9 appearing with similar frequency. The source of this odd bias may be the same one that was causing the death and case counts to grow linearly instead of exponentially or logistically (as the kinds say these days, “sounds kind of sus”).

How do these total deaths for the US compare to the Russian COVID-specific deaths? Oddly enough, while the Russian data had a distance score of 0.130 for uniform random distributions, the US 2020 data has a distance score of 0.080. Wait, WHAAAAT??? How can this be? Is there something fishy about the US data?

Before jumping to conclusions faster than a YouTube conspiracy theorist, consider that the US all-cause deaths from before the pandemic have a distance score of 0.097, all while having a Benford’s law distance score of 0.16 (very close to our 2020 data). In fact, if we compare the digit frequencies from before and during the pandemic, we get a euclidean distance score of only 0.024 (the reverse calculation is simply to show that this euclidean distance metric is symmetric, unlike tools like Kullback–Leibler divergence).

Can we go beyond the 1st digits? Does this apply to the other digits?

I get the feeling if you’re asking this, I’m even more certain that you’re at one of the extremes of being either super-paranoid or already convinced and wanting me to dive into the math more.

Nonetheless, the answer is yes. We can absolutely go beyond the first digit.

Heck, we can even go to the 3rd digit and beyond.

For any length string of digits, the probability of encountering a number starting with the string nn of that length (discarding the leading zeros, of course) is log10(n+1)log10(n)=log10(1+1n)\log _{10}(n+1)-\log _{10}(n)=\log _{10} \left(1+{\frac {1}{n}}\right). For example, the odds of a number starting with 11, 33, 33, 77 would be log10(1+1/1337)0.0003247...\log_{10} (1 + 1/1337) \approx 0.0003247.... As you’ve probably guessed by now, if we want to calculate the chances that the 2nd digit is a 44 (for example), you’d add up the probabilities of all the cases of that happening:

log10(1+114)+log10(1+124)++log10(1+194)0.100308...\log _{10}\left(1+{\frac {1}{14}}\right)+\log _{10}\left(1+{\frac {1}{24}}\right)+\cdots +\log _{10}\left(1+{\frac {1}{94}}\right) \approx 0.100308...

In fact, if you want to calculate the odds that any digit dd (0,1,2,...,90, 1, 2, ..., 9) is encountered at the nnth (n>1n > 1) position, the formula is

k=10n210n11log10(1+110k+d)\sum _{k=10^{n-2}}^{10^{n-1}-1}\log _{10}\left(1+{\frac {1}{10k+d}}\right)

in this multivariate time series isn’t intuitive (I’m actually very curious who it is intuitive to), the values for the 1st through 4th digits (n=1,2,3,4n=1,2,3,4) look like this:

Digit (dd) 0 1 2 3 4 5 6 7 8 9
P(dn=1)P(d \vert n=1) N/A 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%
P(dn=2)P(d \vert n=2) 12.0% 11.4% 10.9% 10.4% 10.0% 9.7% 9.3% 9.0% 8.8% 8.5%
P(dn=3)P(d \vert n=3) 10.2% 10.1% 10.1% 10.1% 10.0% 10.0% 9.9% 9.9% 9.9% 9.8%
P(dn=4)P(d \vert n=4) 10.0% 10.0% 10.0% 10.0% 10.0% 10.0% 10.0% 10.0% 10.0% 10.0%

Notice any patterns now? As you move from the 1st digit to the 2nd to the 3rd to the 4th and so on, the digit distribution starts to resemble a uniform random distribution (all probabilities being 1010%) more and more. It’s a cool trick, but it does mean that differences in distance scores between Benford and Uniform distributions will become less and less important, to the point of being pretty useless for the 3rd and 4th digits.

We can still compare the distributions of 2nd digits from before and during 2020.

expected_df = pd.DataFrame(np.asarray([int((x / 10000) * pre_sum) for x in pre]).reshape((10, 10)))
observed_df = pd.DataFrame(np.asarray([int((x / 10000) * post_sum) for x in post]).reshape((10, 10)))
post_2020_sum = observed_df.sum(axis=0).sum()
pre_2020_sum = expected_df.sum(axis=0).sum()
post_2020_freqs = observed_df.sum(axis=0).div(post_2020_sum).values
pre_2020_freqs = expected_df.sum(axis=0).div(pre_2020_sum).values

Previously we could calculate the distribution from scratch for each run of the function because 10n=1010^n=10 for the 1st digit. For n=2,3,4,...n=2, 3, 4,... it’s probably excessive, especially considering we’re just approaching a uniform distribution.

second_digit_freqs = [0.120, 0.114, 0.109, 0.104, 0.100, 0.097, 0.093, 0.090, 0.088, 0.085]

print("d* for 2020 2nd digits on Newcomb-Benford distribtuion: \t\t{:.3f}".format(benford_d_test(post_2020_freqs.tolist(), second_digit_freqs)))
print("d* for 2020 2nd digits on uniform distribution: \t\t\t{:.3f}".format(benford_d_test(post_2020_freqs.tolist(), [1./10] * 10)))
print("d* for 2015-2019 2nd digits on Newcomb-Benford distribtuion: \t{:.3f}".format(benford_d_test(pre_2020_freqs.tolist(), second_digit_freqs)))
print("d* for 2015-2019 2nd digits on uniform distribution: \t\t{:.3f}".format(benford_d_test(pre_2020_freqs.tolist(), [1./10] * 10)))
print("d* for 2020 2nd digits on 2015-2019 digit distribution: \t\t{:.3f}".format(benford_d_test(post_2020_freqs.tolist(), pre_2020_freqs.tolist())))
print("d* for 2015-2019 2nd digits on 2020 digit distribution: \t\t{:.3f}".format(benford_d_test(pre_2020_freqs.tolist(), post_2020_freqs.tolist())))
d* for 2020 digits on Newcomb-Benford distribtuion: 		0.035
d* for 2020 digits on uniform distribution: 			0.014
d* for 2015-2019 digits on Newcomb-Benford distribtuion: 	0.031
d* for 2015-2019 digits on uniform distribution: 		0.010
d* for 2020 digits on 2015-2019 digit distribution: 		0.011
d* for 2015-2019 digits on 2020 digit distribution: 		0.011

As we anticipated, the differences in dd* for both the Newcomb-Benford and uniform are small. Both are almost a full order of magnitude below the d=0.25d* = 0.25 cutoff we specified earlier, with the pre/post-2020 distances and the uniform distances being even less. At this point

Now, back to the original point of this post…

If you’re trying to distract yourself from this grim-looking histogram, I’m not going to let you look away so easily.

What does this all mean?

It means what health officials have been saying for some time now: That we have more than 200K additional Americans dead this year (which isn’t even over, by the way), and digit analysis points to large-scale data manipulation (or at least major differences before and during 2020) being extremely unlikely

“Okay, if a lot more people are dying, and the numbers are genine, what’s going on with the autopsies?”

Given that my pre-software career was focused on biological aging (you can read more of my posts on that subject here), I can talk pretty confidently at length about aging, morbidity, death, and decay. Early in college, when I was touring medical schools and witnessing medical cadavers, I got to witness firsthand how much better you could understand what’s going on in somebody’s body when you don’t care about how invasive you’re being. Cutting someone open leaves a lot less room for ambiguity than any non-invasive medical imageing. It’s pretty common for the autopsy technician to find comorbidities that the person on the table didn’t know about in life. I’ve heard examples from medical examiners ranging from previously undiagnosed genetic diseases, to hidden cases of liver disease, to surprise MRSA infection, to non-smokers whose lungs were unexpectedly grey, to sepsis coming from getting pricked with a paperclip, to lethal blood clots that upon closer inspection would have been long time coming in hindsight. The list goes on and on.

When I hear that the percentage of autopsy reports with one or more “comorbidities” is 94%, that fits pretty neatly with my worldview that the majority of us are probably not as healthy as we think we are. When we picture the body’s innards, many of us will probably picture something that’s pristine like a textbook. Even those of us in the biological sciences might have a mental model where all the tissues and organs are neat, organized, labelled, and don’t look like the diseased tissues you see in the pictures in later chapters of your anatomy textbook. Living human bodies (except for maybe infants) are not like this, and this is especially true for corpses. True, when you look inside a body for the first time there might be more truth to the color-coding in your textbook than you expected, but beyond that things are a lot more messy and varied. If you want a better sense for this and aren’t good friends with a mortician or medical examiner, I’d recommend checking out Body Worlds (WARNING: This Link is the webpage for an exhibit containing skinless cadavers preserved in plastic. NSFW, and possibly even NSFL, obviously).

For a less gruesome comparison, it’s like we’re all players in a game of Dungeons and Dragons (played by a truly sadistic GM). Many of these comorbidities (from high blood pressure to being diabetic or pre-diabetic) act as constitution penalties. Even if you’re pretty confident that you don’t have any of these, any D&D player can tell you can still fail your saving throw, and 0.2% is not as low as you think when talking about chances of dying.

TL;DR “Is the 200K+ deaths figure genuine?” The answer is yes.


Cited as:

@article{mcateer2020autopsies,
  title   = "Excess COVID-19 Deaths",
  author  = "McAteer, Matthew",
  journal = "matthewmcateer.me",
  year    = "2020",
  url     = "https://matthewmcateer.me/blog/excess-covid-deaths/"
}

If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I will be very happy to correct them right away! Alternatily, you can follow me on Twitter and reach out to me there.

See you in the next post 😄

I write about AI, Biotech, and a bunch of other topics. Subscribe to get new posts by email!


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

At least this isn't a full-screen popup

That'd be more annoying. Anyways, subscribe to my newsletter to get new posts by email! I write about AI, Biotech, and a bunch of other topics.


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.