Enigma Machine

Enigma Machine
German Enigma Machine

Monday, February 22, 2010

The German Enigma Machine

With the advent of mechanical technologies, ciphers became more sophisticated. Up until now, we have discussed simple character substitution and permutation ciphers. All of these could be encrypted and decrypted with paper and pencil. During World War II, a number of encryption machines were developed that allowed more sophisticated encryption systems that were far more difficult to break.
One of these is the Enigma machine developed and used by the German military. The machine’s design was brilliant for its time. It provided the ability to set one of 17,576 keys based on the positioning of 3 ‘rotors’.


The machine was composed of a keyboard with 26 keys, a lamp-board with 26 lamps (1 for each letter), a plugboard, and 3 rotors selected from a larger group of rotors.


The rotors mapped a character of the alphabet to a different character. The mappings were different for each rotor and they were not simple alphabet shifts. The Germans developed 5 standard rotors each with its own mapping. There were 3 additional rotors used by the Navy. At any one time, the machine was loaded with 3 of the rotors. The rotors were set to an initial position picked at random. The entire process is described below.


For each character type, an electrical signal passed through the plug-board. Next, the signal would pass through each of the 3 rotors performing 3 separate mappings. It would then be sent back by a ‘reflector’ into the 3rd rotor, 2nd rotor and 1st rotor in the opposite direction. This made a total of 6 separate mappings before illuminating one of the lamps. After each character, the far right rotor would rotate 1 position. Every 26th character, this rotor would force the 2nd rotor to click forward 1 position. This imitates an automobile’s odometer. Thus, the mappings would change for each character typed. Typing a series of the same character ‘AAAA…’ would result in a string of different characters for each input (i.e. ‘CZTBA…’).


Using the set of 5 standard rotors, there were 60 ways to configure a set of 3 out of the 5 rotors in the machine. Using the set of 8 rotores, there were 336 ways to configure 3 rotors. As mentioned above, there are 17,576 positions in which these rotors can be started (26 * 26 * 26). As described above, given the movement of the rotors (like a car odometer), the number of characters that could be encoded before the same ‘mapping’ occurred is 16,900 (26 * 26 * 25) giving the machine a ‘period’ of almost 17,000. Most messages were kept to several hundred characters so there was no chance of a mapping being repeated.


Machine setup involved running several lines (10-13 total) on the plug-board which mapped one letter to a different letter. This plugboard was similar to the old telephone operator’s board. Three rotors were selected from the set and positioned according to a random ‘indicator setting’ (i.e. 3-17-5 or CQE). The operator would also select 3 rotor positions to encode the message (i.e. 20-2-19 or TBS). The operator set the rotors to the indicator setting and encoded two copies of the message setting (TBSTBS). Next, the operator would reset the rotors to the message setting (TBS) and encode the message. The transmission would then include a preamble that contained the senders id, a time of transmission, and the clear text indicator setting (CQE). Following this was the 6 encoded characters representing the message indicator encrypted twice. Finally, the encoded message would be transmitted.


This has been a very quick description of a fairly complex machine. There are many more details and there was more than one version of the machine developed. Follow on enigma machines had 4 and 5 rotors adding to the number of keys and a longer period. There are some excellent resources on the web describing this machine.


Here are a few urls:



Nova online : How the Enigma Works

The Enigma Cipher Machine


And of course, the ‘Paper Enigma machine’

This last resource has a PDF that allows you to cut three ‘strips’ that imitate the rotors and place them in a paper framework. The result is a paper ‘enigma simulation’.

Tuesday, February 16, 2010

Attacking Polyalphabetic Ciphers

Some time ago, I promised an article discussing the cryptanalysis of polyalphabetic ciphers like playfair. This post will talk a little bit about these methods.


Although the approcah to the analysis of polyalphabetic ciphers is similar to that of monoalphabetic ciphers, because of their nature, these ciphers are harder to crack. The main reasons for this are:



  1. There is no longer a 1 to 1 mapping of characters.

  2. Most of these ciphers involve a 'period' where the mapping of characters are 1 to 1 (i.e. every 5th character has the same map)


Looking at ciphers which use multiple character mappings (i.e. Vignere ciphers), a keyword is selected and the amount a character is shifted in the alphabet is dictated by successive letters in the keyword. Thus, with a short key like 'CAB', 3 alphabets are used and every third character is shifted by the same amount. It is pretty easy to see that a longer key would make the cipher text harder to analyze.


For instance, 'PROTECT' has a 7 character period (even though there is a duplicated character 'T'). In order to decrypt a message, the first thing that has to happen is that the period must be determined. This can be done by counting character occurances and matching frequencies with the suspected source language's character frequencies (i.e. English). However, we now have to do several sets of counting.


Start by selecting every other character and doing a monoalphabetic character frequency attack on just those letters. If the percentages of each character closely match the frequency of english characters (you get to define what is meant by 'closely matches'), then the 'period' of the key is likely 2. However, to be sure, you are going to want to try periods from 2 through at least 7 or 8 so continue the same process using every 3rd character, then every 4th character and so on. Once you have done this, look for the set that matches english character frequencies the closest. That is likely the 'period' of the key.


Now that you know the 'period', you partition the characters into groups. Suppose you determine the period of the key is 4 (the keyword is 4 characters long). The 1st group is every 4th character starting with the 1st character in the message (you already have the frequencies for this set). The second group is every 4th character starting with character 2. Repeat this starting at character 3 and then starting at character 4.


Now for each set, sort the letters by frequency and try substituting the most frequent letter in the english language ('e') for the letter in each set (from the ciphertext) that is its most frequent character. Repeat this with the second most frequent letter of each set. If you guessed correctly at the 'period', the plaintext was english, and the plaintext was not designed to avoid such frequency analysis, then you should be seeing patterns of characters that look like english.


From these results, you can determine what the 'shift' is for each of the 4 alphabets used and recover the 'key' that was used to encrypt the plaintext.
Playfair is a bit different. It doesn't really have a period but also does not have a 1 to 1 mapping of characters. Playfair maps pairs of characters to other pairs of characters which become the 'ciphertext'. Thus, analysis must be done on pairs of characters at a time. These pairs are called digraphs.


Digraph anaylsis can be done similar to character frequency analysis. However, you will need more ciphertext to reliably match digraphs to their plaintext equivalents. Similar to the concept of character frequencies, each language has sets of digraphs that occur more frequently than others.
Here is a resource that lists english character, digraph, and trigraph frequencies:
English Character Frequencies


There are about 600 digraphs in the english language but this page shows the most common 26. It should be obvious that playfair, like vignere, will take more ciphertext to get sufficient frequency counts of digraphs to make an attempt at decoding a message.

Sunday, May 24, 2009

Polyalphabetic Ciphers - Playfair

Polyalphabetic Ciphers - Playfair

Polygraphic ciphers allow the encoding of multiple characters at one time (rather than substituting a single character at a time). There are a number of schemes that encode 2 or more characters simulataneously. One of the most popular of these is the Playfair cipher.

Playfair used a 5x5 box of characters. A keyword is chosen (with all unique characters.) The remaining characters of the alphabet are listed in alphabetical order left to right and top to bottom. One more note, the 5x5 box fits only 25 charcters. The letters i and j share a cell. As an example, using the keyword sleuth, the box would be:

S L E U T
H A B C D
F G IJ K M
N O P Q R
V W X Y Z

The message to be encoded, is divided into pairs of letters. These pairs are called a digraph. If there is a repeated letter in a digraph, an x is inserted between the letters and the digraphs are reorganized. Also, in cases where the messages have an odd number of characters, an x is added at the end.

Now, there are three different ways the letters can be related in the 5x5 encoding box. They can:
1.Both be in the same row
2.Both be in the same column
3.Be neither in the same row nor the same column.

If they are in the same row, substitute the next character in that row for each of the two characters. If you are at the end of a row, wrap around to the first character in the same row. Thus, UT becomes TS.

If the letters are in the same column, use the character below each letter in the box. If the letter is at the bottom of a column, use the character at the top of the column. For example, GO becomes OW.

If the letters are in different rows and different columns, then do the following:
1.Starting at the first letter, scan across the row until you are in the same column as the second letter (remember you can wrap around to the beginning of the row if this letter is further 'right' in the box). Use that letter as the ciphertext.
2.Starting at the second letter, scan across its row until you are in the same column as the first letter (again wrapping around if necessary).

Here is an example of encoding that digraph: If you have NE as your plain text, start with N and travel to the right until you are in the same column as E (column 3). The letter is P. Now starting at E, scan to the right (and wrap around) to end over the letter N (column 1). The letter is S. So NE becomes PS.
Note that in this case, the two plaintext letters and the resulting ciphertext letters form a small box within the 5x5 encoding box.

Let's try a full message. With the keyword SLEUTH, we now encrypt the message:
This is so silly. It is elementary my dear watson.

Removing punctuation and pairing letters, we get:
TH IS IS SO SI LX LY IT IS EL EM EN TA RY MY DE AR WA TS ON

And the resulting cipher is:

SD FE FE LN EF EW UW ME FE UE TI SP LD QZ KZ BT DO LG SL PO

Regrouping as we should in collections of 5 letters we get:

SDFEF ELNEF EWUWM EFEUE TISPL DQZKZ BTDOL GSLPO

This is quite a mess, but a cryptanalyst does have some ways to try and break the encoding. We will look at those techniques in a future column.

Friday, May 15, 2009

Polyalphabetic Ciphers




Well, I've been gone for a while. Things have been busy. I'm going to try to be more regular at this again.




So let's start talking about more substitution ciphers. This time, we'll look at Polyalphabetic ciphers. Remember that Monoalphabetic substitution ciphers used an alphabet that mapped your plaintext characters into the associated ciphertext character. But since there was only one map or table, the same substitution takes place each time a particular character shows up (i.e. 's' always becomes 't' when the alphabet is shifted by 1 place). This is a problem because it allows someone trying to break your code to use character frequency counting to guess at which characters translate to certain frequently used characters. Also, doubled characters are easy to pick out (i.e. 'see' becomes 'tff' since 'e' always becomes 'f').


Starting in the 1500s and 1600s, some people began devising ways of using multiple alphabets within an encoded message (hence, the term 'polyalphabetic'). These maps or tables were either fixed (as with Trimethius' Tabula Recta) or could be based on key words or phrases (Vignere's table). The former used 24 alphabets each shifted 1 additional character to the left. The characters 'i' and 'j' were considered equivalent as well as 'u' and 'v' leaving 24 different alphabets. Successive characters in a message used the next alphabet down for translation. After 24 characters, the encoding with start with the original alphabet again. This masked patterns found in monoalphabetic methods (i.e. the double character problem). Vignere's table as a square with the alphabet across the top and down the left side. A keyword or phrase would be picked and that would be used as the key 'alphabet' to use for each successive character. The keyword was repeated as many times as needed to cover the original message.



Here is part of the table:




a b c d e f g h i …
A A B C D E F G H I …
B B C D E F G H I J …
C C D E F G H I J K …
D D E F G H I J K L
...


The full table is 26x26. The labels on the top (lowercase) are the letters from the plaintext. The letters along the side are from the 'keyword' and select the alphabet used for each successive letter. Using the keyword 'CAB', we can encode the word 'cage' as follows:




c a g e
C A B C
e a h g


The final ciphertext is 'eahg'. For longer messages, this would skew character count statistics. Additionally, using a longer keyword or phrase would be much stronger. You may want to write out the whole 26x26 table and try this with a much longer message and a longer keyword or phrase. Next time, we will look at other polyalphabetic ciphers with some more interesting properties.



Enjoy.

Tuesday, December 23, 2008

Attacking Transposition Ciphers

In this entry, I'll talk about brute-force attacks against Transposition ciphers. A brute-force attack is one where the attacker simply tries all the possible combinations (or keys) to decipher a message. Generally, brute-force attacks are impractical as they do not use any analysis to round down unlikely candidate keys or combinations.

In order for an attacker to decipher this message, they must first guess the size of the groupings (reasonable values might be from 4 to 10). Next, they have to guess the order in which the letters were scrambled. This type of scrambling described above is called a permutation. We can figure out how many different permutations there are by noting the following:
1. For the first letter position you pick, you get to select from all positions (so in our example, there are 7 possible choices.)
2. For the next letter position you pick, you have one less than the whole group (so 6 choices)
3. This continues until your last pick which is limited to 1 character position (whatever position is left).
Thinking about this, for 7 characters, the number of permutations is:
7 * 6 * 5 * 4 * 3 * 2 * 1 = 5040.

So even if the attacker knows the letters are in groups of 7 characters, they have to try as many as 5040 orderings to unscramble them. On the average, we can expect to have to try about half that many combinations or about 2520.
But again, the messages creator could have selected a different goruping size. That means that we would have the following numbers of permutations to try if we were pretty sure the group size was between 4 and 10:

4 - 24 permutations
5 - 120 permutations
6 - 720 permutations
7 - 5040 permutations
8 - 40320 permutations
9 - 362880 permutations
10 - 3628800 permutations

You can see how quickly the number of permutations groups each time we add 1 to the size of the grouping. The work of cracking a cipher like this by hand becomes intractable very quickly. With the help of software that could generate permutations quickly, you are left with a long set of character strings to read. So brute-force methods are still difficult to perform. The best brute-force method would involve having software generate the permutations and then look for common patterns from the english language. The software can 'score' each string according to how many patterns match. Then it can present the most likely candidates to a human cryptanalyst to examine in detail.

Monday, December 22, 2008

Transposition Ciphers

Transposition ciphers encode messages by simply rearranging the order of the characters in the message. As opposed to replacing letters with a substitute character. Note that what I've written about to date are types of Substitution Ciphers.

Transposition ciphers have keys with two parts. One is the size of the grouping of characters that are to be rearranged. The second is the ordering of the characters.

Let's try an example. We decide to rearrange groups of 7 characters. The rearrangement pattern is 6215734. The message is:

THE COURIER IS ENROUTE TO YOUR LOCATION

We start by group the characters in the size by which we are rearranging letters:

THECOUR IERISEN ROUTETO YOURLOC ATIONEE

Note we added some dummy letters to produce a multiple of 7 characters. Now, each group gets reordered by pulling characters in the order specified by our key (6215734).

We take character 6 (U), then character 2 (H), then character 1 (T), and so forth. The first group now becomes:

UHTOREC

We repeat this with the remaining groups and get:

UHTOREC EEISNRI TOREOUT OOYLCUR ETANEIO

Now regroup as sets of 5 characters.

UHTOR ECEEI SNRIT OREOU TOOYL CURET ANEIO

Think about what it might take for a cryptanalyst to decipher such a message. Next time, we will look at that problem a bit closer.

Tuesday, December 16, 2008

More Monoalphabetic Ciphers

The shift cipher is one of the simplest of the 'monoalphabetic' ciphers. Remember, this class of ciphers substitutes one encoded letter for each letter from the unencoded message. There is also a 1-to-1 correspondence between the alphabets used to transform a message into an encrypted message.

A variant of the shift cipher is the keyword-shift cipher. Recall that, for a shift cipher, we simply 'rotate' the alphabet several characters in a particular direction. The 'key' to the cipher is the number of positions you shift the alphabet. In a keyword-shift cipher, the 'key' is the chosen 'keyword'. You pick a word without any duplicate letters (remember that we want our alphabets to map each character to a unique character in the ciphertext alphabet. You start the new alphabet with the keyword, and then, list the remaining characters from a normal alphabet. Let's look at this with the keyword SIMPLE.

A - S
B - I
C - M
D - P
E - L
F - E
G - A
H - B
I - C
J - D
K - F
L - G
M - H
N - J
O - K
P - N
Q - O
R - Q
S - R
T - T
U - U
V - V
W - W
X - X
Y - Y
Z - Z

Given the above 'mapping', if you encode the message : HAPPY DAYS ARE HERE AGAIN. results in : BSNNY PSYRS QLBLQ LSASC J

Remember, we want to obscure word boundaries so we just regroup the message into 5 character groups.

You may notice that the last several characters translate to the same character. This is because the keyword uses letters up to the letter 'S' but no further. It is best if you can use a keyword that assures that no letter translates to itself. Thus, they keyword we picked was acceptable but could be improved by using letters later in the alphabet (ideally including 'Z').

In cryptography, a 'key' that does a poor job of scrambling data is called a weak key. Any key that does a good job of scrambling data is a strong key. So SIMPLE is not a weak key but is also not the best. BEAD would be a very weak key whereas ZEBRA would be a strong key (the Z forces the entire alphabet to shift by at least 1 character.)

Another monoalphabetic cipher is the random substitution cipher. Instead of shifting or using a short keyword, we scramble all 26 letters of the alphabet into a different ordering. Whereas a short keyword (which is easy to remember) can allow two people to create the correct mapping to encrypt and decrypt, this cipher requires 26 characters (the reordered alphabet) to be shared between people using it to communicate. That makes it a little less practical to use but it is harder to accidently use a weak key as in the type of cipher above.

Here is an example:

ZAXQSCWDVEFBRGNTHMYJUKILOP

I only showed the alphabet we are encoding 'to' here. To translate a message with this alphabet, all 'A's become Z, all 'B's become A, all 'C's become X, and so forth up to all 'Z's become P.

Try using this to encode a short message. Then work it backwards (Z->A, A->B, etc.) to see if you get back your original message again.

The actual ordering above isn't quite 'random'. There is a pattern to it. See if you can figure out what that is. I'll mention it a few entries down the road as it will nicely lead into other types of ciphers I want to talk about.