Talk like an Encryption: 2008

Tuesday, December 23, 2008

Attacking Transposition Ciphers

In this entry, I'll talk about brute-force attacks against Transposition ciphers. A brute-force attack is one where the attacker simply tries all the possible combinations (or keys) to decipher a message. Generally, brute-force attacks are impractical as they do not use any analysis to round down unlikely candidate keys or combinations.

In order for an attacker to decipher this message, they must first guess the size of the groupings (reasonable values might be from 4 to 10). Next, they have to guess the order in which the letters were scrambled. This type of scrambling described above is called a permutation. We can figure out how many different permutations there are by noting the following:
1. For the first letter position you pick, you get to select from all positions (so in our example, there are 7 possible choices.)
2. For the next letter position you pick, you have one less than the whole group (so 6 choices)
3. This continues until your last pick which is limited to 1 character position (whatever position is left).
Thinking about this, for 7 characters, the number of permutations is:
7 * 6 * 5 * 4 * 3 * 2 * 1 = 5040.

So even if the attacker knows the letters are in groups of 7 characters, they have to try as many as 5040 orderings to unscramble them. On the average, we can expect to have to try about half that many combinations or about 2520.
But again, the messages creator could have selected a different goruping size. That means that we would have the following numbers of permutations to try if we were pretty sure the group size was between 4 and 10:

4 - 24 permutations
5 - 120 permutations
6 - 720 permutations
7 - 5040 permutations
8 - 40320 permutations
9 - 362880 permutations
10 - 3628800 permutations

You can see how quickly the number of permutations groups each time we add 1 to the size of the grouping. The work of cracking a cipher like this by hand becomes intractable very quickly. With the help of software that could generate permutations quickly, you are left with a long set of character strings to read. So brute-force methods are still difficult to perform. The best brute-force method would involve having software generate the permutations and then look for common patterns from the english language. The software can 'score' each string according to how many patterns match. Then it can present the most likely candidates to a human cryptanalyst to examine in detail.

Monday, December 22, 2008

Transposition Ciphers

Transposition ciphers encode messages by simply rearranging the order of the characters in the message. As opposed to replacing letters with a substitute character. Note that what I've written about to date are types of Substitution Ciphers.

Transposition ciphers have keys with two parts. One is the size of the grouping of characters that are to be rearranged. The second is the ordering of the characters.

Let's try an example. We decide to rearrange groups of 7 characters. The rearrangement pattern is 6215734. The message is:

THE COURIER IS ENROUTE TO YOUR LOCATION

We start by group the characters in the size by which we are rearranging letters:

THECOUR IERISEN ROUTETO YOURLOC ATIONEE

Note we added some dummy letters to produce a multiple of 7 characters. Now, each group gets reordered by pulling characters in the order specified by our key (6215734).

We take character 6 (U), then character 2 (H), then character 1 (T), and so forth. The first group now becomes:

UHTOREC

We repeat this with the remaining groups and get:

UHTOREC EEISNRI TOREOUT OOYLCUR ETANEIO

Now regroup as sets of 5 characters.

UHTOR ECEEI SNRIT OREOU TOOYL CURET ANEIO

Think about what it might take for a cryptanalyst to decipher such a message. Next time, we will look at that problem a bit closer.

Tuesday, December 16, 2008

More Monoalphabetic Ciphers

The shift cipher is one of the simplest of the 'monoalphabetic' ciphers. Remember, this class of ciphers substitutes one encoded letter for each letter from the unencoded message. There is also a 1-to-1 correspondence between the alphabets used to transform a message into an encrypted message.

A variant of the shift cipher is the keyword-shift cipher. Recall that, for a shift cipher, we simply 'rotate' the alphabet several characters in a particular direction. The 'key' to the cipher is the number of positions you shift the alphabet. In a keyword-shift cipher, the 'key' is the chosen 'keyword'. You pick a word without any duplicate letters (remember that we want our alphabets to map each character to a unique character in the ciphertext alphabet. You start the new alphabet with the keyword, and then, list the remaining characters from a normal alphabet. Let's look at this with the keyword SIMPLE.

A - S
B - I
C - M
D - P
E - L
F - E
G - A
H - B
I - C
J - D
K - F
L - G
M - H
N - J
O - K
P - N
Q - O
R - Q
S - R
T - T
U - U
V - V
W - W
X - X
Y - Y
Z - Z

Given the above 'mapping', if you encode the message : HAPPY DAYS ARE HERE AGAIN. results in : BSNNY PSYRS QLBLQ LSASC J

Remember, we want to obscure word boundaries so we just regroup the message into 5 character groups.

You may notice that the last several characters translate to the same character. This is because the keyword uses letters up to the letter 'S' but no further. It is best if you can use a keyword that assures that no letter translates to itself. Thus, they keyword we picked was acceptable but could be improved by using letters later in the alphabet (ideally including 'Z').

In cryptography, a 'key' that does a poor job of scrambling data is called a weak key. Any key that does a good job of scrambling data is a strong key. So SIMPLE is not a weak key but is also not the best. BEAD would be a very weak key whereas ZEBRA would be a strong key (the Z forces the entire alphabet to shift by at least 1 character.)

Another monoalphabetic cipher is the random substitution cipher. Instead of shifting or using a short keyword, we scramble all 26 letters of the alphabet into a different ordering. Whereas a short keyword (which is easy to remember) can allow two people to create the correct mapping to encrypt and decrypt, this cipher requires 26 characters (the reordered alphabet) to be shared between people using it to communicate. That makes it a little less practical to use but it is harder to accidently use a weak key as in the type of cipher above.

Here is an example:

ZAXQSCWDVEFBRGNTHMYJUKILOP

I only showed the alphabet we are encoding 'to' here. To translate a message with this alphabet, all 'A's become Z, all 'B's become A, all 'C's become X, and so forth up to all 'Z's become P.

Try using this to encode a short message. Then work it backwards (Z->A, A->B, etc.) to see if you get back your original message again.

The actual ordering above isn't quite 'random'. There is a pattern to it. See if you can figure out what that is. I'll mention it a few entries down the road as it will nicely lead into other types of ciphers I want to talk about.

Wednesday, December 10, 2008

Attacking a Monoalphabetic (Substitution) Cipher

So, we can't believe that people would go through the extra effort to encrypt a message if there weren't other (nasty, evil, unauthorized) people that would have the desire to decrypt it. The art of attacking an encrypted message (trying to figure out what an encrypted message says) is called cryptanalysis.

Ok, let's admit this up front. Cryptanalysis (even for simple substitution ciphers) is a bit arduous and is not for everyone. So if you don't have a strong desire to understand how to 'break' a cipher, you can skim or skip this blog entry. If you are curious, at least read on a bit and see what the process is like.

So we have encrypted a message with a shift cipher. We cleverly regroup the letters so all the 'words' are five characters long. This leaves no clues based on word size for an attacker to take gueses at words or letters. What is left to do?

Well, if we know that the original message was written in english, then english (like all other languages) has some letters that are used more than others. For instance, the vowels, t, and s are the most frequently used letters in english. The letters x, q, and z are the least frequently used.

Terminology break:
To speak easier about this topic, we need to cover a couple more terms. Plaintext is the original unencrypted message. An encrypted message is called Ciphertext.

Back to the topic:
So, if we intercept a message and it contains enough ciphertext (and we know the original plaintext was english), we can try counting the characters and seeing which ones occur more frequently.

Look at the following message:

wrvsh dnhdv lhude rxwwk lvwrs lfzhq hhgwr fryhu dfrxs ohpru hwhpv sodlq whawl vwkhr uljlq doxhq fubsw hgphv vdjh

Ignoring spaces, the character counts are:

a 1
b 1
d 7
e 1
f 4
g 2
h 16
j 2
k 2
l 7
n 1
o 3
p 3
q 4
r 8
s 5
u 5
v 7
w 10
x 3
y 1
z 1

Total 94

In the english language, e is the most frequently used letter. About 12.7% of characters in normal english prose will be an 'e'. Even without working out strict percentages, we can see that 'h' is a very good candidate based on how often it occurs. 16/94 = 17%. That's a bit high but we would likely guess that h = e. 'w' comprises 10.6% of the encoded message. The next highest character frequency in english is 't'. On average 't' makes up 9% of the characters in english prose. That would lead us to guess that w = t.

We now start replacing characters with our guesses and see if things start to make sense. Try it on a piece of paper. Write an 'e' over every 'h' and a 't' over every 'w'.

You will find you do not have much to go on yet. Ok, look at 'r'. It occurs 8 times so 8/94 = 8.5%. 'a' occurs 8.17% and 'o' occurs 7.5% so r is likely one of these.

Going on like this, we can progressively fill in letters. If we find that a letter doesn't make sensible words when it is inserted, we simply try the next letter in frequency to see if that makes sense. By the way, english character frequencies are well known and published. Here is one source:

http://en.wikipedia.org/wiki/Letter_frequencies

Anyhow, we would likely find that 'a' does not work for 'r' but that 'o' does. If we look carefully at how these letters are related, we see that this is, in fact, the ceasar cipher (the alphabet shifted 3 characters.) t=w, e=h, o=r, etc. We can quickly jump ahead and convert every character to the one 3 characters earlier. What we would find is part of the text in our terminology section above.

This process, as mentioned above, is called 'cryptanalysis'. It is time consuming and difficult so it is not for everyone. I hope this hasn't scared you off. Next time we will go back to talking about another type of character based encryption.

Monday, December 8, 2008

Shift Ciphers

First, let's look at the terminology of encryption. When we say encrypt we mean to scramble a message so it is unreadable by anyone but the intended recipient. To decrypt a messages, means to unscramble a scrambled message.

Most encryption methods have a special piece of information that is kept secret and it is shared between the two people wanting to talk securely. That special information is called a key.

So one of the easist ciphers to understand and use is known as a shift cipher. Here we show two copies of the English alphabet. The second copy is shifted (and we wrap the extra characters around the back end):

ABCDEFGHIJKLMNOPQRSTUVWXYZ

DEFGHIJKLMNOPQRSTUVWXYZABC

Now, if we agree with a friend that we are going to shift our alphabet by 3 places, one of us can encrypt a message by 'looking up' the next letter of the message on the top alphabet and writing the corresponding letter from the bottom alphabet. Similarly, when the receiving party wants to decrypt a message, they lookup each character in the encoded message on the bottom alphabet and write the corresponding letter from the top alphabet.

Let's try a message. By the way, with the above alphabets, we can consider that special secret information (the key) to be '3' since that is the number of letter positions by which the alphabet is shifted. Encoding the message:

WE WILL ATTACK AT DAWN

is translated to :

ZH ZLOO DWWDFN DW GDZQ

Not too bad. But anyone who likes the Sunday newspaper cryptograms knows how to try to decode this. They would look at the smaller letter groups and make guesses as to the words they represent. There are a limited number of short words in english (i.e. IF, OF, TO, ON, UP, AT, etc.) Also, because the two groupings ZH, and DW don't share any letters, you know, for instance, that if one of the words is OF, the other cannot be ON, TO, OR, SO, or NO. In general, once the encryption is complete, you want to write the letters in same size groups as follows:

ZHZLO ODWWD FNDWG DZQZH

Note that we added a couple of characters. This will not matter to the receiver. They have the key and will quickly decrypt the message. Seeing the extra letters at the end will easily be recognized as some 'padding' of the message and be disregarded.

So, shift ciphers like this are of a class of codes known as Monoalphabetic ciphers. This simply means 1 letter 'maps' to 1 letter. These are the easiest codes to break even in the absence of knowing the 'key'

One more note: You can shift the alphabet by anywhere from 1 to 25 places. Shifting the alphabet by exactly 3 characters has a special name. It is a Ceasar cipher as it was created by Julius Ceasar for use in communicating with troops during many of their wars.

Next time, I will try to talk about how to attack this cipher.

Thursday, December 4, 2008

Ok, Let's give this a try

I've been thinking about blogging for about 6 months now but didn't know what to write about. I tend not to spend time on hobbies...I'm just too busy. I wanted to write something that might be useful to people other than myself.

Well, cryptography has been a passion of mine for the past couple of decades. I've worked in the field, read articles for fun, read about older ciphers. It's not everyone's cup of tea. Most people believe Crypto is HARD. So what I'd like to do is try to disprove this complaint. As I get time, I plan on writing about all sorts of things related to ciphers, secret communication, etc. Stuff from the earliest character based ciphers up to current strong encryption and public key infrastructures (oh, err... certificates). It might take a while to get to those more advanced topics. I'll try to make these entries interesting, understandable, and fun. We'll see where it goes.

Here are a couple of books I've enjoyed about cryptography. These are not heavily technical. Both are more about the history of ciphers than the technology.

"Crypto" by Steven Levy
http://www.stevenlevy.com/index.php/other-books/crypto

"The Code Book" by Simon Singh
http://www.simonsingh.net/The_Code_Book.html

The latter has some technical information but still focuses on history.

Enjoy.

Enigma Machine