NYT Digits Puzzle – Generating Functions, Catalan Numbers, Complexity

(NOTE: As of August 18, 2023 the NYT Digits puzzle has been removed the the NYT website. A copy of the game has been reproduced on a new private site: https://engaging-data.com/digitsgame/)

In the previous post, we wrote an algorithm to solve the NYT digits puzzle. Now, we will consider the structure of the puzzle to understand something quite simple: What is the time complexity of our previous algorithm?

Along the way we will again formulate the problem recursively and use the OEIS to find relationships between the sequence of results and other mathematical sequences. Finally, we will use generating functions to find a closed form expression for the computational complexity of the algorithm.

Problem Formulation
Generating Function Solution
Catalan Numbers
Computational Complexity

Problem Formulation

To start, we can write the digits puzzle as a more general problem

We have $N$ integers $d_1 < d_2 < \ldots < d_N$. We have four possible operations $\times$, $\div$, $+$, and $-$. We want to know the number of different real numbers (that are greater than 1) we can form from combining all $N$ integers using the available operations.
(a) Say that we have two different integers. How many different combinations can we form from these integers (given that we only want combinations that result in numbers greater than 1)
(b) Same question as above but we have three different integers
(c) What is the total number of real-number combinations that we can form from $N$ integers

The ultimate goal is the answer to (c): An expression for the total number of combinations we can form from $N$ integers. Such a result would correspond to the number of keys in the output of generator_combos when the input start_dict has $N$ different key-value pairs.

Two Number Combinations

As before, we start with the smallest possible situation: Two numbers. In this case, we already know the answer. The possible operations we could have are

\begin{equation}
d_2+d_1, \, d_2-d_1, \, d_2\times d_1,\, d_2/d_1.\quad
\end{equation}

Note that we exclude the operations $d_1+d_2$ and $d_1\times d_2$ because addition and multiplication are commutative, and we exclude the two operations $(d1/d2)$ and $d_1-d_2$ because we require the output of subtraction to be greater than zero and the output of division to be greater than 1.

Let $Q_{k}$ be the number of unique integer combinations we can make with $k$ numbers (according to the specification that subtraction only yields non-zero integers and division only yields numbers greater than 1). We thus have
\begin{equation}
\qquad Q_{2} = 4. \qquad (1)
\end{equation}

Three Number Combinations

Now let’s consider the case of three numbers that we want to combine. There are $\binom{3}{2}$ ways we can select two numbers from that set of three. When we select those two numbers, there are $Q_{2}$ ways we can generate operation combinations from them in order to yield a single number. We would then be left with the single number generated from the operation combinations and the number that we did not choose from the “$3$ choose $2$” operations. For these two numbers, there are $Q_2$ ways to combine them. Therefore, $Q_3$ is
\begin{equation}
\qquad Q_3 = Q_2\binom{3}{2}Q_2 = 48. \qquad (2)
\end{equation}

$N$ Number Combinations

We now want to generalize this result to the case where we have an arbitrary $N$ numbers to combine. To get to closer to this generalization, we consider the answer for $Q_4$. Akin to the result for $Q_{3}$, we note that there are $\binom{4}{3}$ ways to select three numbers from a set of four. After we select those numbers, there are $Q_3$ ways we can generate operation combinations that set of three. After these combinations, we would have a single number generated from the operation combination of the three numbers and the number we did not choose from the “$4$ choose $3$” operations. There are $Q_2$ ways to combine these final two numbers and thus we find that one term of $Q_4$ is

\begin{equation}
\qquad Q_4 = Q_2\binom{4}{3}Q_3 + \cdots. \qquad (3)
\end{equation}

To find the other term, we do this operation again with a larger set of selected numbers. There are $\binom{4}{2}$ ways to select two numbers from a set of four. For the selected numbers there are $Q_2$ ways to combine them to generate a single number. For the unselected two numbers, there is also $Q_2$ ways to combine them into a single number. After these two separate combinations we would be left with single numbers drawn from each set, which we can then combine in $Q_2$ ways to yield a final result. Thus, there are $Q_2 \binom{4}{2} Q_2 Q_2$ ways to do this combination and the full expression for $Q_4$ is

\begin{equation}
Q_4 \stackrel{?}{=} Q_2\binom{4}{3}Q_3 + Q_2 \binom{4}{2} Q_2 Q_2.
\end{equation}

However, we have done a bit of double counting. Say we have the number $(1, 2, 3, 4)$. If we choose $(1, 2)$ we have $(3, 4)$ left over. Alternatively, if we choose $(3, 4)$ we have $(1, 2)$ left over. Combining $(1, 2)$ in all possible ways and then combining the result with all possible combinations of $(3, 4)$ is the same as combining $(3, 4)$ in all possible ways and then combining the result with all possible combinations of $(1, 2)$. However, the factor $\binom{4}{2}$ counts both sets of combinations as distinct. To correct for this we need to multiply it by $1/2$. We then have the result

\begin{equation}
Q_4 {=} Q_2\binom{4}{3}Q_3 + \frac{1}{2} Q_2 \binom{4}{2} Q_2 Q_2 = 960.
\end{equation}

We can write this result more cleanly(and pave the way towards generalization) by introducing $Q_1 =1$. We then have

\begin{equation}
\qquad Q_4 {=} \frac{1}{2} Q_2 \left[\binom{4}{3} Q_3 Q_1 + \binom{4}{2} Q_2 Q_2 + \binom{4}{1} Q_{1} Q_{3}\right]. \qquad (4)
\label{eq:Q4}
\end{equation}

Given the form \rfw{Q4}, we can now generalize this result to the case where we have $N$ integers. Matching the form of the summations, we have

\begin{equation}
\qquad Q_{N} = \frac{1}{2}Q_2 \sum_{k=1}^{N-1} \binom{N}{k} Q_{N-k} Q_{k}. \qquad (5)
\end{equation}

Above, we found $Q_2=4$ for the NYT digits puzzle setup, but the above recursive relation is valid independent of the exact value of $Q_2$. For example, if we added the “exponent” operation to our previous four operations, we would find $Q_2 = 6$ which would change $Q_3$ and so on, but not the relationship between the $Q_k$s as expressed in Eq.(5).

We can straightforwardly solve Eq.(5) through a dynamic programming solution:

from math import comb
import numpy as np

Q, Q2 = dict(), 4 # define dictionary and Q2 value
Q[1] = 1 # define first dictionary value

def Q_calc(N):
    
    if N in Q.keys():
        return Q[N]
    else:
        return (Q2//2)*np.sum([comb(N,k)*Q_calc(N-k)*Q_calc(k) for k in range(1,N)])

which yields, for the first few values of $N$,

This result matches that found by computing the total number of combinations created by the generator_combos function discussed in NYT Digits Puzzle – Algorithm:

Thus we see that Eq.(5) indeed defines the number of combinations produced in the generator_combos algorithm. Moreover, if we plug this sequence into the OEIS, we find that the sequence appears to be connected to the Catalan numbers, a sequence which appears in many combinatorial contexts involving trees. To see this connection explicitly, it would be most helpful to find an explicit expression for $Q_N$.

Generating Function Solution for $Q_N$

Having used Eq.(5) to write a program to compute $Q_N$, we would now like obtain an analytical expression for $Q_N$. Doing so would allow us to have a closed-form expression for the number of computation combinations generated by the algorithm in the previous post and would also make the relationship with Catalan numbers more explicit. As a first step we define the exponential generating function $Q(x)$ as

\begin{equation}
\qquad Q(x) \equiv \sum_{k=1}^\infty \frac{x^k}{k!} Q_{k}. \qquad (6)
\end{equation}

We note that $Q(x)$ satisfies $Q(x=0) =0$; we will use this result to constrain the possible solutions to an equation that defines $Q(x)$. Multiplying both sides of Eq.(5) by $x^N/N!$ and summing the result from $N=2$ to $N=\infty$, we find

\begin{equation}
\sum_{N=2}^{\infty} \frac{x^N}{N!} Q_{N} = \frac{Q_2}{2} \sum_{N=2}^{\infty} \sum_{k=1}^{\infty}\frac{1}{k!(N-k)!} Q_{N-k} Q_{k} \,x^{N-k} x^{k},
\end{equation}

which given, $Q_1 = 1$, yields

\begin{equation}
Q(x) – x = \frac{Q_2}{2} Q(x)^2.
\end{equation}

Solving this equation for $Q(x)$ in turn gives us
\begin{equation}
\qquad Q(x) = \frac{1}{Q_2} \left(1 – \sqrt{1 -2Q_2 x}\right), \qquad (7)
\label{eq:Qx}
\end{equation}

where we discarded the extraneous solution that does not yield $Q(x=0) = 0.$ Next, we expand the square root as a Taylor series:

\begin{equation}
\sqrt{1- u} = -\sum_{k=0}^{\infty} \frac{1}{2k-1} \binom{2k}{k}\left(\frac{u}{4}\right)^k.
\end{equation}

This expansion makes Eq.(7) become
\begin{align}
Q(x) & = \frac{1}{Q_2} \left(1+\sum_{k=0}^{\infty} \frac{1}{2k-1} \binom{2k}{k}\left(\frac{Q_2x}{2}\right)^k \right)\\
& = \frac{1}{Q_2} \sum_{k=1}^{\infty} \frac{1}{2k-1} \binom{2k}{k}\left(\frac{Q_2x}{2}\right)^k. \qquad (8)
\end{align}

Some combinatorial manipulation gives us
\begin{equation}
\frac{1}{2k-1} \binom{2k}{k} = \frac{2}{k} \binom{2k-2}{k-1} = 2 C_{k-1},
\end{equation}

where $C_{k} = \binom{2k}{k}/(k+1)$ are the Catalan numbers. With this new combinatorial coefficient, Eq.(8) becomes
\begin{equation}
Q(x) = \sum_{k=1}^{\infty}C_{k-1}\left(\frac{Q_2}{2}\right)^{k-1}x^{k},
\end{equation}

and from equating this result to Eq.(6), we can thus infer
\begin{equation}
\qquad Q_{N} = N! \left(\frac{Q_2}{2}\right)^{N-1} C_{N-1}. \qquad (9)
\end{equation}

Given the explicit expression for the Catalan numbers, the closed form expression for $Q_{N}$ is in turn
\begin{equation}
\qquad Q_{N} = \left(\frac{Q_2}{2}\right)^{N-1}\frac{(2N-2)!}{(N-1)!}.\qquad (9)
\end{equation}

$Q_N$ represents the number of ways to computationally combine $N$ numbers, given that $Q_2$ is the number of ways to computationally combine two numbers. For the case of $Q_2 = 4$ (as for the NYT digits puzzle), we find that the first six elements of the sequence are

$$ Q_N :\, 1, \,4, \,48, \,960, \, 26880, \,967680, \ldots $$

which matches the counting that we found through the dynamic programming implementation of Eq.(5).

(See NYT Digits Problem – Extended for an extended discussion of this approach)

Catalan Numbers

In Eq.(7), we found that $Q_N$ (statement of what $Q_N$ is) is related to the Catalan numbers through

\begin{equation}
\qquad Q_{N} = N! \left(\frac{Q_2}{2}\right)^{N-1} C_{N-1} \qquad
\end{equation}

The Catalan numbers are a sequence of numbers that often appear in combinatorial problems with some element of recursion. Why do they appear in the context of the NYT digits puzzle?

An explanation of their appearance comes from one of the applications of Catalan numbers described in the associated Wikipedia entry (which is itself taken from Enumerative Combinatorics by Richard Stanley [1] ).

$C_n$ is the number of different ways $n + 1$ factors can be completely parenthesized (or the number of ways of associating associating $n$ applications of a binary operator, as in the matrix chain multiplication problem). For $n = 3$, for example, we have the following five different parenthesizations of four factors:
\begin{equation}
((ab)c)d\quad (a(bc))d \quad(ab)(cd) \quad a((bc)d) \quad a(b(cd))
\end{equation}

When we are doing computations for the NYT digits game, we are essentially doing “parenthesizations” for a given ordering of numbers, where the operation that defines the parenthesization $ (ab)$ is chosen from one of the $Q_2=4$ operations that allow us to combine a pair of numbers. From here, the connection to Catalan numbers is more clear. We can construct $Q_N$ the number of ways to computationally combine $N$ numbers by multiplying the factors

$C_{N-1}$: Number of ways to “parenthesize” $N$ numbers for a given ordering
$N!$ : Number of possible orderings of $N$ numbers
$ (Q_2)^{N-1}$: Number of operation combinations for a given “parenthesized” combination
$1/2^{N-1}$: Combinatorial correction for double counting re-orderings of elements in a pair

which yields

$$ Q_{N} = N! \left(\frac{Q_2}{2}\right)^{N-1} C_{N-1},$$

as we previously found.

Computational Complexity

Computational complexity of an algorithm refers to the order of magnitude expression for the number of computations one needs in order to implement the algorithm. To determine the complexity of the nyt_digits_solver (which relies on the generator_combos algorithm), we in turn need to determine the asymptotic behavior of $Q_N$ (which counts the number of dictionary elements in generator_combos). Using Stirling’s approximation (i.e., $N! \sim \sqrt{2\pi N}(N/e)^N$), we find that for $N\gg1$, $Q_N$ in Eq.(10) becomes $Q_N \sim O\left((2Q_2)^{N}\right)$ which for $Q_2 = 4$ reduces to

$$ Q_N \sim O\left(2^{3N}\right).$$

Thus our NYT digits solver is in the exponential complexity class, the same class shared by the dynamic programming solutions to the knapsack problem and the traveling salesperson problems.

Final Remarks

We have come pretty far over the past two posts. From incremental programming to recursion. From generating functions to Catalan numbers. And finally to computational complexity. The NYT digits game is ostensibly a simple puzzle but the fact that it contains so many layers is a great example of how lots of beautiful mathematics and algorithmic formalism is often hidden in plain sight. Now, this digits game is just one of the many shown on the NYT games page. The previous analysis leads to the natural question: What are the algorithmic possibilities inherent in the other games?

Footnotes

[1] Stanley, Richard P. “Enumerative Combinatorics Volume 1 second edition.” Cambridge studies in advanced mathematics (2011).

NYT Digits Puzzle – Generating Functions, Catalan Numbers, Complexity

Problem Formulation

Two Number Combinations

Three Number Combinations

\(N\) Number Combinations

Generating Function Solution for \(Q_N\)

Catalan Numbers

Computational Complexity

Final Remarks

Footnotes

One comment on “NYT Digits Puzzle – Generating Functions, Catalan Numbers, Complexity”

Leave a Reply