The 2020 Election and The Probability of Electoral Mis-Alignment

In American politics, it is not common for a presidential candidate to win the Electoral College vote while losing the popular vote, but if it happens, one can be sure that there will be a lot of soul-searching commentary on this fairly intentional aspect of how the country selects its executive leader.

Reviewing the history of the Electoral College, it is apparent that the majority will of the people was something the Framers of the Constitution were deeply suspicious of:

Direct election was rejected not because the Framers of the Constitution doubted public intelligence but rather because they feared that without sufficient information about candidates from outside their State, people would naturally vote for a “favorite son” from their own State or region. At worst, no president would emerge with a popular majority sufficient to govern the whole country. At best, the choice of president would always be decided by the largest, most populous States with little regard for the smaller ones.
https://uselectionatlas.org/INFORMATION/INFORMATION/electcollege_history.php

Thus a system of “Electors” was chosen as a way to soften the influence of a raw-numbers-based demographic majority. This is the same reason America has a bicameral legislature with one chamber’s members not apportioned by the population of the associated state.

Of course, the country (not to mention the Electoral College itself) has changed quite a bit since its founding. Mathematical gerrymandering. Multiple Voting Rights Acts. Expansive networks of lobbyists and Political Action Committees. And, most prominently, a wider sense of who exactly is included in “The People” of the country. There have been so many transformations to what Americans see as their nation and its rules that it seems fair to ask whether the anti-popular spirit of the original Electoral system still applies.

In other words, there are excellent reasons to not be beholden to the political vision of people who lived two and a half centuries before you and who had contradictory and largely self-serving definitions of liberty and equality. However, it is also important to recognize that the Electoral College is serving its original purpose when it leads (albeit rarely) to an outcome counter to that of the popular vote.

Table of Contents

Electoral Misalignment

OK so we know the Electoral College was never meant to align perfectly with the direction of the popular vote, but the two go in the same direction often enough that it is noteworthy when they don’t. This leads to a basic question:

Can we be more rigorous about how likely this lack of alignment will happen in a given election?

Namely, for a given election (i.e., its polls, turnout stats, count of electors, etc.) can we determine the probability that whoever wins the Electoral College will also lose the popular vote?

To better outline this situation, we can represent the voting outcomes for a single candidate at the end of Election day.

Either the candidate wins both the popular vote and the Electoral College, the candidate loses both, or the candidate wins one and loses the other¹. We call these “win-loss” situations “Electoral Misalignment,” and we want to compute their total probability.

To do so, we will begin by outlining some naive but convenient theoretical assumptions of this calculation and will then focus on the 2020 election. We will pretend that we are just one day prior to that election day with all the then-current polling data. Here’s what we want to know:

What was the misalignment probability in the 2020 election?
Was the misalignment probability higher for a Democratic presidential win or a Republican presidential win?
What does this suggest about the demographics of electoral politics in 2020 and potentially in the future?

The Model

There are undoubtedly many models of presidential elections, but we will start from scratch, laying out the assumptions and alternative routes (which we will not take) in building the final model.

We assume there are only two candidates running for president. A democratic candidate represented by \(D\) and a republican candidate represented by \(R\). In actual elections, there are often third-party candidates but we will assume that we can split these candidates votes among the two candidates in ways that don’t change the vote-differences for each state². We define \(M\) as the number of “states” and label each state with \(\alpha = 1, \ldots, M\). We put “states” in quotation marks to note that the \(M\) entities include not only states like Texas or California, but also the voting districts in Nebraska, Maine, and the District of Columbia. From henceforth, we will write states without quotations to represent this larger set.

The results of an election can be written as \(\textbf{n}_D = (n_{D, 1}, n_{D, 2}, \ldots, n_{D, M})\) and \(\textbf{n}_R= (n_{R, 1}, n_{R, 2}, \ldots, n_{R, M})\) representing, respectively, the votes for the democratic and the republican candidate in each of the \(M\) states.

Our larger objective is to calculate the probability that a candidate will win the popular vote, the electoral vote, or generally any combination of winning one and losing the other. In order to do this, we need to associate probabilities with the particular votes for each state and define what it means to win the popular vote and the electoral vote.

Popular Win

In a popular vote system, the candidate who receives the most votes across all states wins. We denote by \(n_D \equiv \sum_{\alpha=1}^M n_{D, \alpha}\) and \(n_R \equiv \sum_{\alpha=1}^M n_{R, \alpha}\) the number of total votes for the Democratic and Republican candidates, respectively. Therefore we can write the condition for the Democratic candidate to win the popular vote as

\begin{equation}
\sum_{\alpha=1}^M \big(n_{D, \alpha} – n_{R, \alpha}\big)> 0, \qquad \text{[\(D\) Popular Vote Win]} \qquad (1)
\end{equation}

Electoral College Win

In the Electoral College system, each state gets a certain number of “electors,” and if one candidate wins a majority³ of the vote in that state, then all of the electors vote for that candidate. The candidate with the most electoral votes across all states is deemed the winner of the Electoral College and the presidential election.

We define \(\lambda_{\alpha}\) as the number of electoral votes for state \(\alpha\) (e.g., \(\lambda_{\text{Texas}} = 38\) ). We denote \(e_D\) and \(e_R\) as the total number of electoral votes for the Democratic and Republican candidates, respectively. Using the Heaviside step function \(\Theta(x)\) defined as

\begin{equation}
\Theta(x) = \begin{cases}1 & \text{for \(x>0\)} \\[0.5em] 0 & \text{otherwise} \end{cases} \qquad (2)
\end{equation}

we can write the number of electoral votes for the Democratic and Republican candidates as

\begin{equation}
e_D = \sum_{\alpha=1}^M \lambda_{\alpha} \Theta\left(n_{D, \alpha} \,- n_{R, \alpha}\right) \qquad e_R = \sum_{\alpha=1}^M \lambda_{\alpha} \Theta\left(n_{R, \alpha} \,- n_{D, \alpha}\right). \qquad (3)
\end{equation}

For a democratic candidate to win, we must have \(e_D> e_R\). Or

\begin{equation}
\sum_{\alpha=1}^M \lambda_{\alpha}H\big(n_{D, \alpha} \,- n_{R, \alpha}\big)> 0, \qquad \text{[\(D\) Electoral College Win]} \qquad (4)
\end{equation}

where we defined \(H(x) \equiv \Theta(x)\, – \Theta(\,-x)\).

Total Votes and Margin

Eq.(1) and Eq.(4) give us the conditions for a Popular vote and Electoral College win in terms of the votes for Democratic and Republican candidates, but for the models we build later, it will prove more useful to use a different set of variables. We define \(n_{\alpha}\) as the total number of votes in a state and \(\delta_{\alpha}\) as the difference in the fraction of votes between the Democratic and Republican candidates:

\begin{equation}
n_{\alpha} \equiv n_{\alpha, D} + n_{\alpha, R}, \qquad \delta_{\alpha} \equiv \frac{n_{\alpha, D}\, – n_{\alpha, R} }{n_{\alpha, D} + n_{\alpha, R} }\qquad (5)
\end{equation}

In polling speak \(n_{\alpha}\) is the raw “turnout” or “number of ballots” cast in an election and \(\delta_{\alpha}\) is the “margin of victory (or loss)” for the Democratic candidate in state \(\alpha\). Inverting the system in Eq.(5) to solve for \(n_{D, \alpha}\) and \(n_{R, \alpha}\) in terms of \(n_{\alpha}\) and \(\delta_{\alpha}\), we find that the popular vote and electoral college win conditions Eq.(1) and Eq.(4) become, respectively,

\begin{equation}
\sum_{\alpha=1}^M n_{\alpha} \delta_{\alpha} > 0, \qquad \sum_{\alpha=1}^M \lambda_{\alpha} H(\delta_{\alpha}) > 0 \qquad (6)
\end{equation}

where we used the identity \(\Theta(c \delta_{\alpha}) = \Theta(\delta_{\alpha})\) for \(c>0\). Alternatively, using vector notation \(\boldsymbol{\delta} = (\delta_1, \delta_2, \ldots, \delta_M)\), we can write simpler popular vote and electoral college win conditions as

\begin{equation}
\textbf{n}\cdot \boldsymbol{\delta} > 0, \qquad \boldsymbol{\lambda} \cdot H(\boldsymbol{\delta}) > 0, \qquad (7)
\end{equation}

where the \(\Theta(\textbf{x}) \equiv (\Theta(x_1), \Theta(x_2), \ldots, \Theta(x_M))\). Finally, using the Heaviside function Eq.(2), we can write these two conditions as binary yes or no (i.e., “1” or “0”) functions:

\begin{align}
\text{Democrat Popular Win} &= \Theta\left(\textbf{n}\cdot \boldsymbol{\delta}\right) \qquad (8a) \\[.75em]
\text{Democrat Electoral College Win} &= \Theta\left(\boldsymbol{\lambda} \cdot H(\boldsymbol{\delta}) \right) \qquad (8b)
\end{align}

Probabilistic Models

The benefit in writing the conditions for a Popular or Electoral College win in terms of \(\boldsymbol{\delta}\) and \(\textbf{n}\) is that we can use polling data and previous voter turnout data to build probabilistic models for the election day values of these quantities. Both win-margin and the total number of ballots in a state can be seen as random variables in that we do not have enough information to precisely specify them on the day of the election, but we can make best guesses as to their spaces of possible values using well-chosen assumptions and some historical data.

First, we define \(\rho_{0}(\boldsymbol{\delta}, \textbf{n})\) as the probability distribution for the margin and vote count vectors. With this probability density, we can use Eq.(8) to write the probability for a Democratic win of the popular vote as

\begin{equation}
\text{Prob}\,(n_D > n_R) = \int_{\Omega^M_{\text{margin}}} d^M \boldsymbol{\delta} \int_{\Omega^M_{\text{votes}}}d^M \textbf{n}\, \rho_{0}(\boldsymbol{\delta}, \textbf{n}) \,\Theta\left(\boldsymbol{\delta}\cdot \textbf{n}\right),
\end{equation}

where \(\Omega^M_{\text{diff}} = [-1, 1]^{M}\) and \(\Omega^M_{\text{votes}} = \mathbb{R}_{+}^M\) are the domains of integration for the margin and total number of votes respectively.

Second, we will make three simplifying assumptions for the distribution \(\rho_0\):

Margin and Vote Count Independence: The random variables \(\delta_{\alpha}\) and \(n_{\alpha}\) for the same state are independent (i.e., the margin of win and the number of ballots are independent)⁴.
State Independence: The random variables \(n_{\alpha}\), \(\delta_{\alpha}\) and \(n_{\alpha’}\), \(\delta_{\alpha’}\) for \(\alpha \neq \alpha’\) are independent (i.e., different states have independent distributions for vote count and margin).
Normality: Both random variables \(\delta_{\alpha}\) and \(n_{\alpha}\) are normally distributed.

The first assumption allows us to factor the distribution between margin and vote counts:

\begin{equation}
\rho_{0}(\boldsymbol{\delta}, \textbf{n}) = \rho_{\text{margin}}(\boldsymbol{\delta}) \rho_{\text{votes}}(\textbf{n})
\end{equation}

The last two assumptions then allow us to model the probability distributions for the margin and for the vote counts across states as

\begin{equation}
\rho_{\text{margin}}(\boldsymbol{\delta}) \equiv \prod_{\alpha=1}^M \frac{1}{\sqrt{2\pi \sigma_{\alpha}^2}} e^{-(\delta_{\alpha}- \mu_{\alpha})^2/2\sigma_{\alpha}^2}, \qquad \rho_{\text{votes}}(\textbf{n}) \equiv \prod_{\alpha=1}^M \frac{1}{\sqrt{2\pi s_{\alpha}^2}} e^{-(n_{\alpha}- m_{\alpha})^2/2s_{\alpha}^2}. \qquad (9)
\end{equation}

The parameters \(\mu_{\alpha}\) and \(\sigma_{\alpha}\) are the expected value and the standard deviation of the Democratic candidate’s margin of victory (or loss) for state \(\alpha\). And \(m_{\alpha}\) and \(s_{\alpha}\) are the expected value and the standard deviation for the total number of ballots cast in the state \(\alpha\) for the two candidates.

With the probability distributions defined in Eq.(9) and the conditions of the popular vote or Electoral College win in Eq.(8), we can now write expressions for the probabilities of the Democratic candidate winning the popular vote or the Electoral College. For the popular vote, the win probability is⁵

\begin{equation}
\text{Prob}\,(n_D > n_R) = \int_{\mathbb{R}^{2M}} d^M \boldsymbol{\delta}\,d^M \textbf{n}\, \rho_{\text{margin}}(\boldsymbol{\delta})\,\rho_{\text{votes}}(\textbf{n}) \,\Theta\left(\boldsymbol{\delta}\cdot \textbf{n}\right) \qquad \text{[\(D\) Electoral Win]} \qquad (10)
\end{equation}

and for the Electoral College, the win probability is

\begin{equation}
\text{Prob}\,(e_D > e_R) = \int_{\mathbb{R}^{2M}} d^M \boldsymbol{\delta}\, \rho_{\text{margin}}(\boldsymbol{\delta})\,\Theta\big(\boldsymbol{\lambda}\cdot H(\boldsymbol{\delta})\big).\qquad \text{[\(D\) Popular Win]} \qquad (11)
\end{equation}

We note that given our first simplifying assumption (i.e., independence of voter turnout and margin), the probability of Electoral College win Eq.(11) is independent of the total number of votes cast in each state.

Now, the question that motivated this discussion went beyond the win-or-lose probability for each branch of the two voting systems. Instead, we wanted to know the probability that the voting systems were misaligned, namely that one candidate would win according to one and lose according to the other. We can again use our probability formalism to calculate this quantity. First, we note that the total misalignment probability consists of the sum of two terms:

\begin{equation}
\text{Misalignment Probability} = \text{Prob}\,(e_D > e_R \cap n_D < n_R)+\text{Prob}\,(e_D < e_R \cap n_D > n_R) \qquad (12)
\end{equation}

This is the probability that one candidate wins the electoral and loses the popular plus the probability that the other candidate gets the same outcome. In terms of our above probability distributions, the quantities that make up this expression can be written as

\begin{equation}
\text{Prob}\,(e_D > e_R \cap n_D < n_R) = \int_{\mathbb{R}^M} d^M \boldsymbol{\delta}\,d^M \textbf{n}\, \rho_{\text{margin}}(\boldsymbol{\delta})\,\rho_{\text{votes}}(\textbf{n}) \,\Theta\big(\boldsymbol{\lambda}\cdot H(\boldsymbol{\delta})\big)\Theta\left(-\boldsymbol{\delta}\cdot \textbf{n}\right) \qquad (13)
\end{equation}

\begin{equation}
\text{Prob}\,(e_D < e_R \cap n_D > n_R) = \int_{\mathbb{R}^M} d^M \boldsymbol{\delta}\,d^M \textbf{n}\, \rho_{\text{margin}}(\boldsymbol{\delta})\,\rho_{\text{votes}}(\textbf{n}) \,\Theta\big(\boldsymbol{\lambda}\cdot H(\boldsymbol{-\delta})\big)\Theta\left(\boldsymbol{\delta}\cdot \textbf{n}\right), \qquad (14)
\end{equation}

where we flipped the sign of \(\boldsymbol{\delta}\) according to whether the \(R\) candidate winning corresponds to a “positive” voting margin.

Beyond calculating misalignment probability, we can use this formalism to see whether the Electoral College supports or works against a candidate in terms of whether winning or losing the popular vote is consistent with their win of the Presidential election. If it is more likely for a candidate to win the Electoral College while losing the popular vote than it is for the candidate to lose the Electoral College while winning the popular vote, we can interpret this result as the Electoral College being biased (in the statistical sense⁶) towards that candidate. That is, the Electoral College is more likely to not reflect the will of the people when said candidate is elected than when the competing candidate is elected.

We can measure the extent of the bias by computing the normalized difference between each term that makes up the misalignment probability. We define this bias as

\begin{equation}
\text{\(D-R\) Electoral College Bias} = \frac{\text{Prob}\,(e_D > e_R \cap n_D < n_R)- \text{Prob}\,(e_D < e_R \cap n_D > n_R)}{\text{Prob}\,(e_D > e_R \cap n_D < n_R)+ \text{Prob}\,(e_D < e_R \cap n_D > n_R)} \qquad (15)
\end{equation}

If \(D-R\) Electoral College Bias \(=1\) then the Electoral College was completely biased (again in the statistical sense) towards the Democratic candidate. This means that there was no way for the Republican candidate to win the Electoral College without also winning the Popular vote, but the Democratic Candidate could win the Electoral College without also winning the Popular vote. A value of \(-1\) corresponds to the opposite situation, and a value of \(0\) means there is no bias, i.e., it’s equally likely for either candidate to win the Electoral College while losing the Popular vote.

With these definitions, we can now start collecting data to estimate misalignment probabilities and biases. We will use the 2020 election as our frame of reference, but, of course, these results can be extended to the current election year.

Data Collection and Parameter Estimates

Let’s imagine it is November 2nd, 2020, the night before the Presidential election. We want to use the above model to predict not only the probability that one candidate will win the Electoral College but also the probability that the election results will be “misaligned,” i.e., that the candidate who wins the Electoral College also loses the popular vote. We want the total probability this will occur for either candidate and beyond this, we want to know for which candidate such misalignment is more likely. Knowing the latter will let us know whether this election’s electoral map has an anti-populist bias for one candidate or the other.

From Eq.(9), it is clear that we will need to determine the quantities \(\mu_{\alpha}\), \(\sigma_{\alpha}\), \(m_{\alpha}\), and \(s_{\alpha}\) for all states \(\alpha\). That is, we want the expected values and widths of the margin and total number of votes for each state. The two data sources we will use to estimate these quantities are the 2020 polling results from 270towin.com and vote count statistics for the 2000 to 2016 Presidential elections (remember we’re pretending we don’t know the votes for 2020).

Margin Data Collection

We want to estimate the mean and variance (i.e., \(\mu_{\alpha}\) and \(\sigma_{\alpha}^2\)) of

\begin{equation}
\delta_{\alpha} \equiv \frac{n_{\alpha, D}\, – n_{\alpha, R} }{n_{\alpha, D} + n_{\alpha, R} }
\end{equation}

for all states \(\alpha\). To do so, we can collect pre-election polling data for each state, compute the difference between the Democratic and Republican vote percentages and then compute the relevant statistics for the computed differences. For example, on the site https://www.270towin.com/2020-polls-biden-trump/arizona/ we see the following Arizona pre-election polling table

From this table, it is straightforward to compute the Biden-Trump margin for each poll and then compute the mean and variance across the five most recent polls relative to the election day. These quantities will then serve as estimates \(\mu_{\text{Arizona}}\) and \(\sigma_{\text{Arizona}}^2\). To compute the full set of \(\mu_{\alpha}\) and \(\sigma_{\alpha}\), we just need to do this for all states.

To streamline this process, we can write a script to scrape the 270towin site and then store the computed quantities in a dictionary. Here is an example section of the script

# whether to include conservative correction
correction_ = True
size_ = 0.03

# going through state list for non-congressional districts
# compiling mean and median data
for state in tqdm(state_list):
    # eliminates the congressional districts
    if sum([state.find('1'), state.find('2'), state.find('3')])==-3:   
        # get response for website
        state_short = reduced_state_dict[state]
        wikiurl=f"https://www.270towin.com/2020-polls-biden-trump/{state_short}/"
        response=requests.get(wikiurl)

        # parse data from the html into a beautifulsoup object
        soup = BeautifulSoup(response.text, 'html.parser')
        find_table=soup.find_all('table',{'id':"polls"})

        # getting first table
        df=pd.read_html(str(find_table[0]))
        # convert list to dataframe
        df=pd.DataFrame(df[0])

        # computing biden trump poll difference
        df['Diff'] = (df['Biden'].str.strip('%').astype(float)-df['Trump'].str.strip('%').astype(float)[0])/100

        # removing the header and getting first five polls
        # offsetting index if 'averages' is first elemenat
        idx0 = sum([True for elem in list(df['Source']) if 'verage' in str(elem)])
        df_cut = df.iloc[idx0:idx0+5]

        # computing mean and standard deviation of most recent five polls
        mean_ = np.mean(df_cut['Diff'])
        var_ = np.var(df_cut['Diff'])

        # filling in delta dictionary
        # incorporating hidden conservative lean
        delta_dict[state]['mean'] = conserv_correc(mean_, correction_, size_)
        delta_dict[state]['var'] = var_

(Note: This code is part of a larger notebook and will not run on its own. See the notebook at the end for a full executable file.)

One thing that should be mentioned about the above code is the function conserv_correc. The function is defined as

def conserv_correc(mean, include=True, size=0.03):
    """
    Polls today (i.e., circa 2020) seem to 
    underestimate conservative preference. We
    include a small correction to account for this bias
    """
    if include:
        return mean - size
    else:
        return mean

The function shifts the average margin of the polls by a certain amount to account for the fact that the 2016 polls underestimated the conservative lean. We assume that the 2020 polls would do the same⁷ and account for this underestimation by this small shift.

With the mean and variance of the polling margins for various states, we can now access a single state to determine its relevant statistics. For example, for Washington D.C. we have

Total Votes Data Collection

For the total votes data collection, we want to estimate the mean \(m_{\alpha}\) and variance \(s^2_{\alpha}\) of the total number of votes for each state in the 2020 election. I wasn’t able to find data on pre-election day turnout projections for 2020, so we will take a more predictive modeling approach: For each state, we will collect the total number of ballots cast in the presidential elections from 2000 to 2016, and we will use those five data points to train a linear regression that forecasts the turnout in 2020. The 2020 prediction for state \(\alpha\) will stand in for the mean \(m_{\alpha}\) and the mean squared error of the model will stand in for the variance \(s^2_{\alpha}\).

To collect this data we again perform some web scraping, but we will use Wikipedia as our data source. For the 2000 to 2016 election years, Wikipedia keeps track of the number of ballots cast in each state. For example, a section of the 2016 table looks like

(The total number of votes is at the right end of the table and not shown in the image) By scraping this table for each year, we can compute the total number of votes for each state for each presidential election. The code to do this looks like

# list of states in alphabetical order
state_list = list(electoral_votes.keys())

# years in string and integer form
years_string = ['2000', '2004', '2008', '2012', '2016']
years_int = np.array([int(year) for year in years_string])

# votecount dictionary of dataframes
votecount_df_dict = dict()

for year in tqdm(years_string):
    wikiurl=f"https://en.wikipedia.org/wiki/{year}_United_States_presidential_election#Results_by_state"
    response=requests.get(wikiurl)

    # parse data from the html into a beautifulsoup object
    soup = BeautifulSoup(response.text, 'html.parser')
    find_table=soup.find_all('table',{'class':"wikitable"})

    # getting table with electoral votes
    for table in find_table:
        if 'Iowa' in str(table) and 'Alabama' in str(table):
            table_key = table

    # getting first table
    df=pd.read_html(str(table_key))
    # convert list to dataframe
    df_orig=pd.DataFrame(df[0])

    # dropping the higest level column
    df_drop = df_orig.copy()
    df_drop.columns = df_orig.columns.droplevel()

    # converting state/district name to just state
    df_drop.rename(columns = {df_drop.columns[0]: 'State'}, inplace = True)

    # getting starting index for state names
    row_names = list(df_drop['State'])
    for k in range(len(row_names)): 
        if 'Ala' in str(row_names[k]):
            start_idx = k
            break  

    # getting column name for vote count
    first_level_names = list(df_orig.columns.droplevel(1))
    second_level_names = list(df_orig.columns.droplevel(0))
    for elem1, elem2 in zip(first_level_names, second_level_names):
        if 'Total' in elem1:
            break

    # getting compiling dictionary        
    data_dict = {'State': state_list,
    'Total Votes': np.array(list(df_orig.iloc[k:k+56][(elem1, elem2)])).astype(int)}        

    # creating dataframe for year
    votecount_df_dict[year] = pd.DataFrame.from_dict(data = data_dict)

(Note: This code is part of a larger notebook and will not run on its own. See the notebook at the end for a full executable file.)

The total vote count data for each state in a particular year is stored in votecount_df_dict[year]. With this data, we can then predict the total number of votes for 2020 and the variance of the prediction.

votecount_dict = defaultdict(dict)
for state in state_list:

    # collecting yearly ballot data for state
    ballot_count = list()
    for yr in years_string:
        temp_df = votecount_df_dict[yr]
        count = temp_df[temp_df['State']==state]['Total Votes'].iloc[0]
        ballot_count.append(count)
    ballot_count = np.array(ballot_count)    
    
    # fitting linear regression
    linreg = LinearRegression()
    linreg.fit(years_int.reshape(-1, 1), ballot_count.reshape(-1, 1))

    # predictions and true values
    predictions = linreg.predict(years_int.reshape(-1,1))
    true_values = np.array(ballot_count).reshape(-1, 1)
    
    # mean square error and predicted 2020 result
    pred_result = linreg.predict(np.array([[2020]]))[0][0] 
    mean_sqr_error = mean_squared_error(true_values, predictions)
    
    # adding to dictionary
    votecount_dict[state]['mean'] = pred_result
    votecount_dict[state]['var'] = mean_sqr_error

To visually depict this projection, we can plot, for example, the 2020 total votes projection for Wyoming compared with the actual value and the values from previous years.

In this case, we see that the true 2020 value exists outside the 68% confidence interval for the prediction. Still, we will use this simple extrapolation model to predict turnout for each state.

With the mean and variance of the total votes for various states, we can now access a single state to determine its relevant statistics. For example, for Wyoming we have

Simulation

Having estimated \(\mu_\alpha\), \(\sigma^2\alpha\), \(m_{\alpha}\), and \(s^2_{\alpha}\) for each state \(\alpha\), we now have estimates of the probability distributions in Eq.(9) and we can compute Eq.(10) and Eq.(11), the probabilities of Popular vote and Electoral College win, respectively. Since both quantities are integrations over probability distributions, we can use Monte Carlo integration to evaluate them. Namely, rather than computing the integrals through a standard numerical quadrature, we can sample the space of points according to the probability distribution, compute the non-distribution integrand for each sample, and take the average of the result.

First, to define the parameters \(\mu_\alpha\), \(\sigma^2_\alpha\), \(m_{\alpha}\), and \(s^2_{\alpha}\) from the collected data we convert the collected data into vectors and matrices.

# ballot count dictionary; assuming independent variances
n_mean_vec = np.array([votecount_dict[state_]['mean'] for state_ in state_list])
n_cov_matrix = np.diag([votecount_dict[state_]['var'] for state_ in state_list])

# delta dictionary; assuming independent variances
delta_mean_vec = np.array([delta_dict[state_]['mean'] for state_ in state_list])
delta_cov_matrix =  np.diag([delta_dict[state_]['var'] for state_ in state_list])

# electoral college vector
lambda_vector = np.array([electoral_votes[state_] for state_ in state_list])

Then we can simulate the possibilities with the following code.

##
# Democrat Win Calculation
##


# defining H function
H = lambda x: np.heaviside(x, 0)-np.heaviside(-x, 0)

# for sampling from normal distribution
sample_vector = lambda mean, cov: np.random.multivariate_normal(mean = mean, cov = cov)

# number of times to simulate election
Nsim = 10000

# lambda vector
lambda_vec = list()

# winning electoral college
dems_electoral_wins = list()

# winning popular vote
dems_popular_wins = list()

# winning both electoral college and popular vote
dems_elec_and_pop_wins = list()

# differences
pop_vote_diff = list()
elec_vote_diff = list()

# going through simulations
for _ in tqdm(range(Nsim)):
    
    # ballot count vector
    n_vector = sample_vector(n_mean_vec, n_cov_matrix)

    # difference vector
    delta_vector = sample_vector(delta_mean_vec, delta_cov_matrix)

    # popular vote win
    popular_win = np.heaviside(np.dot(delta_vector, n_vector), 0)

    # electoral vote win
    electoral_win = np.heaviside(np.dot(lambda_vector, H(delta_vector)), 0)

    # electoral and popular vote win
    electoral_popular_win = popular_win*electoral_win

    # appending election statistics
    pop_vote_diff.append(np.dot(delta_vector, n_vector))
    elec_vote_diff.append(np.dot(lambda_vector, H(delta_vector)))
    
    # appending election result
    dems_electoral_wins.append(electoral_win)
    dems_popular_wins.append(popular_win)
    dems_elec_and_pop_wins.append(electoral_popular_win)

From these simulations, we can compute the 2020 Democratic candidate’s (Joe Biden) chances of winning either the popular vote or the Electoral College.

This model suggests that given pre-election Polling data (including the “conservative correction“) Biden had an 80% chance of winning the Electoral College, but was essentially guaranteed to win the popular vote. Another way to see this result is to show histogram distributions of the popular and Electoral College vote differences.

We see that the Democrat-Republican difference in popular votes is normally distributed with a peak at ~ 7 million votes⁸ with no part of the distribution extending below zero. On the other hand, the Democract-Republican difference in electoral votes, although showing the Democratic candidate winning most of the time, shows a non-negligible number of instances of a Republican win.

This result already hints at the answer to a question that started this investigation. If the Republican candidate could have won the electoral college vote but could not also win the popular vote, then there must be a non-zero probability of misalignment in this election and moreover, the total misalignment probability must have come primarily from only one candidate’s misaligned win.

Non-Symmetric Misalignment

At last, we can turn to the question that prompted this investigation. We rewrite the question in the context of the 2020 election for convenience.

What was the probability of electoral misalignment for the 2020 election and which candidate was more likely to be misaligned if they won the electoral vote?

To answer this question, we can use a similar simulation script as that given above, but tailor it to calculate the joint probabilities Eq.(13) and Eq.(14):

##
# Misalignment Probability Calculation
##

# defining H function
H = lambda x: np.heaviside(x, 0)-np.heaviside(-x, 0)

# number of times to simulate election
Nsim = 10000

# lambda vector
lambda_vec = list()

# winning electoral college
dems_electoral_wins = list()

# winning popular vote
dems_popular_wins = list()

# winning electoral college and not winning popular vote
dems_elec_win_pop_loss = list()

# winning popular vote and not winning electoral vote
dems_pop_win_elec_loss = list()

# going through simulations
for _ in tqdm(range(Nsim)):
    
    # ballot count vector
    n_vector = sample_vector(n_mean_vec, n_cov_matrix)

    # difference vector
    delta_vector = sample_vector(delta_mean_vec, delta_cov_matrix)

    # popular vote 
    dems_popular_win = np.heaviside(np.dot(delta_vector,  n_vector), 0)
    reps_popular_win = np.heaviside(np.dot(-delta_vector, n_vector), 0)
    
    # electoral vote 
    dems_electoral_win = np.heaviside(np.dot(lambda_vector, H(delta_vector)), 0)
    reps_electoral_win = np.heaviside(np.dot(lambda_vector, H(-delta_vector)), 0)

    # appending election results
    dems_elec_win_pop_loss.append(dems_electoral_win*reps_popular_win)
    dems_pop_win_elec_loss.append(reps_electoral_win*dems_popular_win)

From running these simulations, we find

These results show us two things: First, the misalignment probability was about 22% for the 2020 election; Second, the entirety of this misalignment came from the Republican candidate winning the Electoral College but losing the popular vote (i.e., the Democratic candidate winning the Popular Vote but Lossing the Electoral College). The fact that the \(D-R\) Electoral College Bias is equal to -1 is another way to represent this fact. It implies that in all the cases where the Electoral College went against the popular vote, the Democratic candidate lost the presidency.

In other words, the composition of the likely Electoral College votes had an anti-popular bias against the Democratic candidate and in favor of the Republican candidate. The Republican candidate could win the Election without winning the popular vote, but this was not true for the Democratic candidate. We can see this more clearly by redrawing our square diagram with some numbers filled into the boxes.

The model suggests that Biden was essentially guaranteed to win the popular vote, but not guaranteed (albeit a likely favorite) to win the Electoral College.

A higher probability of misalignment (i.e., electoral college win and a popular vote) for one candidate over another suggests that the electoral map has an anti-popular bias for the candidate with a lower probability of winning with misalignment.

When one reflects on the origins of the Electoral College system and on the nature of the 2020 (or 2016) election, one sees an irony in the lopsided nature of this misalignment. The Electoral College system was instituted largely to avoid electing into positions of power the sort of anti-elite populist candidate that Donald Trump’s candidacy represented (albeit superficially). Describing the origins of this system, Phillip J VanFossen notes that the Constitutional founders

“…believed that the electors would ensure that only a qualified person became president. And they thought the Electoral College would serve as a check on a public who might be easily misled.”

Thus the Electoral College system, by definition, was supposed to be “election according to an established elite.” And yet in 2016 and 2020, more than 200 years from that time, it is that very system that made it more likely than otherwise that a populist (again superficially) candidate would be elected to the Presidency against the majority-count desires of the wider population.

[//]

Companion Notebook

https://github.com/mowillia/abstractions/blob/main/2023_mc_election_simulation.ipynb

Footnotes

There’s also the small possibility of tying in the Electoral College and the much less likely possibility of tying for the popular vote, but we ignore these two cases. ↩︎
This is admittedly a strong assumption (i.e., that an equal number of people would have voted for the Democratic and Republican candidates if the third parties were not available). ↩︎
Actually, it is if they win a “plurality” of the vote, but when we restrict the options to two candidates then this amounts to a majority. ↩︎
Folk wisdom is that high turnout favors Democrats which makes this seem like a bad assumption for American politics in the 21st century. However detailed analyses suggest the relationship between turnout and party win likelihood is not simple. https://politics.stackexchange.com/questions/61009/does-high-turnout-on-average-actually-only-slightly-favor-democrats ↩︎
In the integrals, we are integrating over all of \(\mathbb{R}^M\) space but we know the margin is bounded above by 1 and below by 0 while the number of votes is bounded below by 0. We assume that the widths of the distributions are sufficiently small that these boundaries are essentially at infinity. ↩︎
In statistics, bias means that the computed statistic (in this case the electoral college vote) is not an accurate reflection of the population. https://en.wikipedia.org/wiki/Bias_(statistics) ↩︎
Turns out they did https://www.pewresearch.org/methods/2021/03/02/what-2020s-election-poll-errors-tell-us-about-the-accuracy-of-issue-polling/. ↩︎
The position of this peak adds credence to the model. The actual popular vote difference was \(81,283,501-74,223,975 = 7,059,526\); Link ↩︎