Enigma Machine

Enigma Machine
German Enigma Machine

Tuesday, February 16, 2010

Attacking Polyalphabetic Ciphers

Some time ago, I promised an article discussing the cryptanalysis of polyalphabetic ciphers like playfair. This post will talk a little bit about these methods.


Although the approcah to the analysis of polyalphabetic ciphers is similar to that of monoalphabetic ciphers, because of their nature, these ciphers are harder to crack. The main reasons for this are:



  1. There is no longer a 1 to 1 mapping of characters.

  2. Most of these ciphers involve a 'period' where the mapping of characters are 1 to 1 (i.e. every 5th character has the same map)


Looking at ciphers which use multiple character mappings (i.e. Vignere ciphers), a keyword is selected and the amount a character is shifted in the alphabet is dictated by successive letters in the keyword. Thus, with a short key like 'CAB', 3 alphabets are used and every third character is shifted by the same amount. It is pretty easy to see that a longer key would make the cipher text harder to analyze.


For instance, 'PROTECT' has a 7 character period (even though there is a duplicated character 'T'). In order to decrypt a message, the first thing that has to happen is that the period must be determined. This can be done by counting character occurances and matching frequencies with the suspected source language's character frequencies (i.e. English). However, we now have to do several sets of counting.


Start by selecting every other character and doing a monoalphabetic character frequency attack on just those letters. If the percentages of each character closely match the frequency of english characters (you get to define what is meant by 'closely matches'), then the 'period' of the key is likely 2. However, to be sure, you are going to want to try periods from 2 through at least 7 or 8 so continue the same process using every 3rd character, then every 4th character and so on. Once you have done this, look for the set that matches english character frequencies the closest. That is likely the 'period' of the key.


Now that you know the 'period', you partition the characters into groups. Suppose you determine the period of the key is 4 (the keyword is 4 characters long). The 1st group is every 4th character starting with the 1st character in the message (you already have the frequencies for this set). The second group is every 4th character starting with character 2. Repeat this starting at character 3 and then starting at character 4.


Now for each set, sort the letters by frequency and try substituting the most frequent letter in the english language ('e') for the letter in each set (from the ciphertext) that is its most frequent character. Repeat this with the second most frequent letter of each set. If you guessed correctly at the 'period', the plaintext was english, and the plaintext was not designed to avoid such frequency analysis, then you should be seeing patterns of characters that look like english.


From these results, you can determine what the 'shift' is for each of the 4 alphabets used and recover the 'key' that was used to encrypt the plaintext.
Playfair is a bit different. It doesn't really have a period but also does not have a 1 to 1 mapping of characters. Playfair maps pairs of characters to other pairs of characters which become the 'ciphertext'. Thus, analysis must be done on pairs of characters at a time. These pairs are called digraphs.


Digraph anaylsis can be done similar to character frequency analysis. However, you will need more ciphertext to reliably match digraphs to their plaintext equivalents. Similar to the concept of character frequencies, each language has sets of digraphs that occur more frequently than others.
Here is a resource that lists english character, digraph, and trigraph frequencies:
English Character Frequencies


There are about 600 digraphs in the english language but this page shows the most common 26. It should be obvious that playfair, like vignere, will take more ciphertext to get sufficient frequency counts of digraphs to make an attempt at decoding a message.

No comments: