Mastermind is a game all about information. The Code Master selects one of \( 6^4 = 1\,296 \) secret codes. Each incorrect guess gives us information by eliminating some of these; the more codes that are ruled out, the more information that guess has provided. Let's quantify this insight! Suppose a guess gets some response that reduces the number of possible keys from some number \(n\) to a smaller \(n'<n\). The convention in information theory, a branch of mathematics and computer science dealing with just this type of question, is to define the information \(I\) provided by this guess and response as \[ I := \log_2 \frac{n}{n'}. \] The unit of information (with respect to the base-2 logarithm; other bases can be used) is called bits. One bit of information corresponds to reducing the number of possible keys by half; a guess-response pair providing two bits of information reduces it by \( (1/2)^2 = 1/4 \). There are good theoretical reasons for adopting this admittedly unintuitive apparatus. For our purposes it is enough to note that information is additive. If a first guess provides one bit of information, and a second two bits, the number of possible codes has been reduced by \( 1/2 \times 1/4 = 1/8\); in total the two guesses have provided \( 1 + 2 = 3\) bits of information!
We can now come up with a simple strategy for playing Mastermind: always make that guess which provides the most information. But we don't know the information of a guess in advance, since it depends on which response the Code Master will give us, and therefore on what the true secret code is. Let's suppose that the Code Master selects a secret code uniformly at random, with no code being more likely than any other; this is in fact what the program does under the hood. We then want to make the guess which provides the most information on average.
The probability of getting a certain response to a guess is \( n' / n \), since this is exactly the proportion of all possible codes which would give that response if it was the true secret. Let's call this probability \(p\); we can then rewrite the above expression for the information of this guess-response pair as \[ I = \log_2 \frac{1}{p}. \]
Our strategy should be to make the guess with the highest expected value of information over all possible responses; this expected information is called the entropy of the guess and is denoted by \( H \). (If you're familiar with entropy from thermodynamics, the connection is mostly historical; it is probably unhelpful to look for a conceptual connection.) Let \(p_i = n'_i / n\) be the probability of a guess getting response \(i\), where \(n'_i \) is the corresponding number of then possible secret codes. The entropy of a guess is defined as
\[ H := \sum_{\text{response } i} p_i \log_2 \frac{1}{p_i}, \]
which again has the unit of bits. We have now landed on our final strategy: start by figuring out the number of possible secret codes \(n\). For each guess, calculate the number \(n_i'\) of codes that will still be viable if the Code Master gives response \(i\) in return. Do this for all possible responses. Finally, calculate the entropy of each guess; pick the one with the highest. Surprise, surprise — this is exactly what is done above! The number in bits listed under Best Guesses is the guess' entropy.
So how well does this strategy perform? On average over all 1 296 secret codes, it beats the Code Master in 4.47 guesses (with a standard deviation of 0.75). Take the last significant digit with a grain of salt; there are usually many guesses with equal entropy, of which the program chooses one arbitrarily. Is this performance any good? For starters, the amount of information needed to narrow all \(n\) possible codes down to one is \(\log_2 n\) bits; this is displayed next to the number of possible moves. If you play around for a while, you'll notice that the best guess usually has an entropy of two to three bits. We would then expect it to take around four guesses to get the \(\log_2 1\,296 = 10.3\) bits of information needed at the start of a new game.
This performance is also in line with other Mastermind algorithms. To the best of my knowledge, Donald Knuth published the first analysis of the game with his 1976 The Computer as Mastermind. In the following decades, many other strategies have been suggested, all achieving an expected number of guesses of around 4.4. (See e.g. Neuwirth, Some Strategies for Mastermind, 1982.) It is not surprising that maximizing entropy is a successful strategy; it solves the game with brute force. Calculating the number \(n'\) of still possible codes after a guess-response turns out to be expensive; in fact, it is NP-complete. (Stuckman & Zhang, Mastermind is NP-Complete, 2005.) The necessary lookup table quickly becomes unmanageable as the length of the code and the number of colors grow. At the end of the day, a bit more than four guesses seem to be as good as you can do on average. You simply don't get enough information to narrow down the options quicker. This argument, of course, is informal; if anyone knows of a more rigorous one, please be in touch!
The idea of making the guess with the highest entropy is taken wholesale from 3Blue1Brown's excellent video on Wordle. Worlde is of course a variant of Mastermind, albeit more engaging. All code I have written for this project, including for this webpage is available on GitHub. Published on June 11, 2025.