Sep
14

Some weeks ago, I rented the 2007 adventure film National Treasure: Book of Secrets on DVD, in which treasure hunter Benjamin Franklin Gates (played by Nicholas Cage) looks to discover the truth behind the assassination of Abraham Lincoln. In the movie’s first scene, which takes place in a tavern in Washington, D.C. five days after the end of the Civil War, Ben Gates’ great-great-grandfather Thomas Gates is approached by John Wilkes Booth and another member of the Knights of the Golden Circle, who ask him to decipher a secret message, which has obviously been encrypted using the Playfair cipher and might lead them to a mythological city of gold called Cíbola.

As the Playfair cipher was state-of-the-art at the end of the Civil War in 1865, I wondered how someone (even if portrayed as a well-known puzzle solver) would be able to perform a successful ciphertext-only attack within just one or two hours, not having any frequency tables at hand and given a ciphertext consisting of only 22 digraphs (= pairs of letters). The following article will explain the basic concepts (encryption, decryption and cryptanalysis) of the Playfair cipher using the example from National Treasure: Book of Secrets.

The Playfair cipher1 was the first digraph substitution cipher in history, that is, letters are sequentially encrypted and decrypted in pairs. This scheme was invented in 1854 by Charles Wheatstone, but bears the name of Lord Playfair2, who promoted the use of the cipher. The digraph substitution makes frequency-based cryptanalysis significantly harder as one has to deal with 600 possible digraphs3 rather than the 26 possible monographs. In effect, larger ciphertexts are necessary to perform a successful cryptanalysis compared to conventional monograph substitution ciphers. Due to this characteristic, the Playfair cipher was superior to many contemporary ciphers and as it was also relatively easy to use, the British forces even employed it as a field cipher during World War I, about 50 years after the American Civil War.

So how did the fictional character Thomas Gates manage to decrypt the following (rather short) ciphertext?

1
2
3
4
ME IK QO TX CQ
TE ZX CO MW QC
TE HN FB IK ME HA
KR QC UN GI KM AV

Most likely, he simply guessed the correct keyword “DEATH” by using the given hint “The debt that all men pay.” In that case, he would have constructed the corresponding 5×5 Playfair square by entering “DEATH”4 in the first row5 and filling the square up with the remaining letters of the alphabet6.

1
2
3
4
5
D E A T H
B C F G I
K L M N O
P Q R S U
V W X Y Z

The ciphertext can now be easily decrypted using the above Playfair square by applying the following four rules7 to all ciphertext digraphs:

  1. If the two letters appear on the same row of the Playfair square, replace them with the letters to their immediate left respectively (wrapping around to the right side of the row if a letter in the original pair was on the left side of the row).
  2. If the two letters appear on the same column of the Playfair square, replace them with the letters immediately above respectively (wrapping around to the bottom side of the column if a letter in the original pair was on the top side of the column).
  3. If the letters are not on the same row or column of the Playfair square, replace them with the letters on the same row respectively but at the other pair of corners of the rectangle defined by the original pair. (The first encrypted letter of the pair is the one that lies on the same row as the first plaintext letter.)
  4. Drop any extra “X” characters which don’t make sense in the final message.

Hence, the plaintext of the message which Thomas Gates successfully decrypted reads as follows:

1
2
3
4
LA BO UL AY EL
AD YW IL LX LE
AD TO CI BO LA TE
MP LE SO FG OL DX

If the superfluous “X” characters are dropped and the whitespaces are modified correctly, the resulting message is “Laboulaye lady will lead to Cibola temples of gold”. In the movie, this hint refers to the French Statue of Liberty, which is actually the sister statue of the American Statue of Liberty, whose intellectual creator was the French politican Édouard René de Laboulaye. Cíbola is one of the fantastic Seven Cities of Gold existing only in a myth that originated around the year 1150 when the Moors conquered Mérida, Spain. The legend of the seven cities of gold survived for many centuries and even drew the Conquistadors northward until they encountered the French colonists, who successfully resisted their further advance. In the movie National Treasure: Book of Secrets Thomas Gates’ descendant Ben Gates finally manages to rediscover the mythological temples of gold in a huge cave under Mount Rushmore.

However, the final question remains whether a ciphertext-only attack on the short ciphertext given in the movie would be feasible. The usual entry point for cryptanalysis relating the Playfair system is a frequency analysis of the ciphertext’s digraphs. Unluckily, the above plaintext doesn’t contain even one of the ten most frequent English digraphs: th, he, in, er, an, re, nd, at, on, nt. Another way of attacking the Playfair cipher is the fact that if a letter pair AB is encrypted to CD then the pair BA is always encrypted to DC. Thus finding such pairs in the ciphertext (e.g., “CQ … QC … QC”) may prove highly fruitful. But again, the corresponding plaintext digraphs “EL” and “LE” have relatively low frequencies in ordinary English texts, whereas the digraphs “TH” and “HT” are the most frequent English digraphs but don’t appear in the plaintext at all. An obvious weakness of the Playfair cipher (especially if the password is relatively short) is the fact that in many cases the Playfair square ends with “XYZ”. In the given example, the situation is even worse as the last line equals to the end of the alphabet “VWXYZ”. Another way of breaking the above ciphertext might be the so called shotgun hill climbing method in combination with the massive computation power of modern computers. This method takes an educated guess as the basis for the initial square (e.g., “VWXYZ” as the last line) and employs suitable metrics (e.g., frequency count) to find promising mutations of the Playfair square, which ultimately leads to an approximate solution. However, the unusual nature of the given plaintext “Laboulaye lady will lead to Cibola temples of gold” makes it very hard to choose good metrics and even promising ciphertext fragments like “ME IK … IK ME …” probably won’t lead anywhere if the cryptanalyst doesn’t have any initial idea about the cleartext’s content (e.g., “LA BO ul ay [e]” and “ci BO LA”).

In conclusion, the short length of the given ciphertext would have made it virtually impossible for the fictional character Thomas Gates to break the encryption by classical means in 1865. Nowadays, the possibility to perform attacks like the shotgun hill climbing method on powerful computers allow for feasible attacks even in case of short and unusual plaintexts. If you want to give it a try yourself, I recommend using the free software CrypTool, which is not only a great learning environment for cryptographic concepts but also provides many useful tools to attack classical ciphers.

Finally, it might be interesting to know that officials found a Vigenère tableau8 in the room of the historical John Wilkes Booth after he had shot Abraham Lincoln. As the Confederacy used the Vigenère cipher in conjunction with cipher disks during the Civil War9, the prosecution in the trial against eight Southern sympathizers sought to show that Booth’s Viginère tableau proved the Confederacy government’s involvement in Lincoln’s assassination.

Footnotes:
  1. The Playfair cipher is sometimes referred to as the Playfair code, which is factually wrong because codes operate on the level of linguistic entities, whereas ciphers do not. []
  2. Lyon Playfair, 1st Baron Playfair; 1818-1898. []
  3. I and J are treated as one letter and no duplicate letters are allowed within the same digraph, so 25*24=600. []
  4. No character of the alphabet is entered more than once. []
  5. The movie incorrectly states that a Playfair keyword must have exactly five letters, but actually any length works. []
  6. I and J are treated as one letter. []
  7. The inverse of these four rules was used to encrypt the message in the first place. []
  8. The Vigenère cipher is a method of encrypting alphabetic text by using a series of different Caesar ciphers based on the letters of a keyword. It is a simple form of polyalphabetic substitution. []
  9. The Confederacy’s primary keywords were “Manchester Bluff”, “Complete Victory” and, as the war came to a close, “Come Retribution”. []
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
1 User viewing this page. (1 Guest)


6 Comments (Write a Comment)
  1. Travis says:

    How come there isn’t a letter ‘J’ in the playfair square?

    • me says:

      because if there was a j then it wouldn’t fit into a 5×5 square and j is one of the least used letters… it makes sense if you think about it…

  2. Travis says:

    Never mind. I get it… you’re supposed to leave the ‘J’ out.

  3. Kurt says:

    I love the Playfair cypher and when I watched the movie I paused it and wrote down the letters. I tried to solve it.
    First I tried a trigraph brute-forcer program (fairplay.exe) It didn’t work. The message is too short and the distribution of letters is all wrong.
    Then I tried DEATH as the key word. I solved the cypher in 15 minutes. :)

  4. Truden says:

    Not sure that this is true:), but thanks for a post.
    Truden


2 Pingbacks & Trackbacks

[...] The Playfair cipher in “National Treasure: Book of Secrets” [...]

[...] The Playfair cipher in “National Treasure: Book of Secrets” [...]

Name:    *required
E-mail:    *required (won't be published)
URL: 
Please write your comment in English.
Comments: 
Subscription: