Previous section   Next section

9.2 Classical Cryptosystems

Classical cryptosystems (also called single-key or symmetric cryptosystems) are cryptosystems that use the same key for encipherment and decipherment. In these systems, for all Ek C and k K, there is a Dk D such that Dk = Ek–1.

EXAMPLE: The Caesar cipher discussed earlier had a key of 3, so the enciphering function was E3. To decipher "KHOOR," we used the same key in the decipherment function D3. Hence, the Caesar cipher is a classical cipher.

There are two basic types of classical ciphers: transposition ciphers and substitution ciphers.

9.2.1 Transposition Ciphers

A transposition cipher rearranges the characters in the plaintext to form the ciphertext. The letters are not changed.

EXAMPLE: The rail fence cipher is composed by writing the plaintext in two rows, proceeding down, then across, and reading the ciphertext across, then down. For example, the plaintext "HELLO, WORLD" would be written as:

HLOOL

ELWRD

resulting in the ciphertext "HLOOLELWRD."

Mathematically, the key to a transposition cipher is a permutation function. Because the permutation does not alter the frequency of plaintext characters, a transposition cipher can be detected by comparing character frequencies with a model of the language. If, for example, character frequencies for 1-grams match those of a model of English, but 2-gram frequencies do not match the model, then the text is probably a transposition cipher.

Attacking a transposition cipher requires rearrangement of the letters of the ciphertext. This process, called anagramming, uses tables of n-gram frequencies to identify common n-grams. The cryptanalyst arranges the letters in such a way that the characters in the ciphertext form some n-grams with highest frequency. This process is repeated, using different n-grams, until the transposition pattern is found.

EXAMPLE: Consider the ciphertext "HLOOLELWRD." According to a Konheim's digram table [590], the digram "HE" occurs with frequency 0.0305[1] in English. Of the other possible digrams beginning with "H," the frequency of "HO" is the next highest, at 0.0043, and the digrams "HL," "HW," "HR," and "HD" have frequencies of less than 0.0010. Furthermore, the frequency of "WH" is 0.0026, and the digrams "EH," "LH," "OH," "RH," and "DH" occur with frequencies of 0.0002 or less. This suggests that "E" follows "H." We arrange the letters so that each letter in the first block of five letters (from "H" up to but not including the "E") is adjacent to the corresponding letter in the second block of five letters, as follows.

HE

LL

OW

OR

LD

Reading the letters across and down produces "HELLOWORLD." Note that the shape of the arrangement is different from that in the previous example. However, the two arrangements are equivalent, leading to the correct solution.

[1] This means that in Konheim's sample, 3.05% of the digrams were "HE."

9.2.2 Substitution Ciphers

A substitution cipher changes characters in the plaintext to produce the ciphertext.

EXAMPLE: The Caesar cipher discussed earlier had a key of 3, altering each letter in the plaintext by mapping it into the letter three characters later in the alphabet (and circling back to the beginning of the alphabet if needed). This is a substitution cipher.

A Caesar cipher is susceptible to a statistical ciphertext-only attack.

EXAMPLE: Consider the ciphertext "KHOOR ZRUOG." We first compute the frequency of each letter in the ciphertext:

G

0.1

H

0.1

K

0.1

O

0.3

R

0.2

U

0.1

Z

0.1

We now apply the character-based model. Let f(i) be the correlation of the frequency of each letter in the ciphertext with the character frequencies in English (see Figure 9-1). Let f(c) be the frequency of character c (expressed as a fraction). The formula for this correlation for this ciphertext (with all arithmetic being mod 26) is

f(i) = S0 c 25 f(c)p(ci) = 0.1p(6 – i) + 0.1p(7 – i) + 0.1p(10 – i) + 0.3p(14 – i) + 0.2p(17 – i) + 0.1p(20 – i) + 0.1p(25 – i)

This correlation should be a maximum when the key k translates the ciphertext into English. Figure 9-2 shows the values of this function for the values of i. Trying the most likely key first, we obtain as plaintext "EBIIL TLOIA" when i = 6, "AXEEH PHKEW" when i = 10, "HELLO WORLD" when i = 3, and "WTAAD LDGAS" when i = 14.

Figure 9-2. The value of f(i) for 0 i 25 using the model in Figure 9-1.

graphics/09fig02.gif

The example above emphasizes the statistical nature of this attack. The statistics indicated that the key was most likely 6, when in fact the correct key was 3. So the attacker must test the results. The statistics simply reduce the number of trials in most cases. Only three trials were needed, as opposed to 13 (the expected number of trials if the keys were simply tried in order).

EXAMPLE: Using Konheim's model of single-character frequencies [590], the most likely keys (in order) are i = 6, i = 10, i = 14, and i = 3. Konheim's frequencies are different than Denning's, and this accounts for the change in the third most probable key.

9.2.2.1 Vigenère Cipher

A longer key might obscure the statistics. The Vigenère cipher chooses a sequence of keys, represented by a string. The key letters are applied to successive plaintext characters, and when the end of the key is reached, the key starts over. The length of the key is called the period of the cipher. Figure 9-3 shows a tableau, or table, to implement this cipher efficiently. Because this requires several different key letters, this type of cipher is called polyalphabetic.

Figure 9-3. The Vigenère tableau.

graphics/09fig03.jpg

EXAMPLE: The first line of a limerick is enciphered using the key "BENCH," as follows.

Key

B ENCHBENC HBENC HBENCH BENCHBENCH

Plaintext

A LIMERICK PACKS LAUGHS ANATOMICAL

Ciphertext

B PVOLSMPM WBGXU SBYTJZ BRNVVNMPCS

The index of coincidence measures the differences in the frequencies of the letters in the ciphertext. It is defined as the probability that two randomly chosen letters from the ciphertext will be the same. Let Fc be the frequency of cipher character c, and let N be the length of the ciphertext. It can be shown (see Exercise 7) that the index of coincidence IC is graphics/09infig03.gif. Figure 9-4 shows the expected values of IC for several periods. The lower the index of coincidence, the less variation in the characters of the ciphertext and (from our model of English) the longer the period of the cipher.

Figure 9-4. Indices of coincidences for different periods. From Denning [269], Table 2.2, p. 78.

graphics/09fig04.gif

For many years, the Vigenère cipher was considered unbreakable. Then a Prussian cavalry officer named Kasiski noticed that repetitions occur when characters of the key appear over the same characters in the ciphertext. The number of characters between the repetitions is a multiple of the period.

EXAMPLE: Let the message be THE BOY HAS THE BAG and let the key be VIG. Then:

Key

VIGVIGVIGVIGVIG

Plaintext

THEBOYHASTHEBAG

Ciphertext

OPKWWECIYOPKWIM

In the ciphertext, the string OPK appears twice. Both are caused by the key sequence VIG enciphering the same ciphertext, THE. The ciphertext repetitions are nine characters apart. Hence, 9 is a multiple of the period (which is 3 here).

We examine the ciphertext for multiple repetitions and tabulate their length and the number of characters between successive repetitions. The period is likely to be a factor of the number of characters between these repetitions. From the repetitions, we establish the probable period, using the index of coincidence to check our deduction. We then tabulate the characters for each key letter separately and solve each as a Caesar cipher.

EXAMPLE: Consider the Vigenère cipher

ADQYS

MIUSB

OXKKT

MIBHK

IZOOO

EQOOG

IFBAG

KAUMF

VVTAA

CIDTW

MOCIO

EQOOG

BMBFV

ZGGWP

CIEKQ

HSNEW

VECNE

DLAAV

RWKXS

VNSVP

HCEUT

QOIOF

MEGJS

WTPCH

AJMOC

HIUIX

      

Could this be a Caesar cipher (which is a Vigenère cipher with a key length of 1)? We find that the index of coincidence is 0.043, which indicates a key of length 5 or more. So we assume that the key is of length greater than 1, and apply the Kasiski method. Repetitions of two letters or more are as follows.

Letters

Start

End

Gap length

Factors of gap length

MI

5

15

10

2, 5

OO

22

27

5

5

OEQOOG

24

54

30

2, 3, 5

FV

39

63

24

2, 2, 2, 3

AA

43

87

44

2, 2, 11

MOC

50

122

72

2, 2, 2, 3, 3

QO

56

105

49

7, 7

PC

69

117

48

2, 2, 2, 2, 3

NE

77

83

6

2, 3

SV

94

97

3

3

CH

118

124

6

2, 3

The longest repetition is six characters long; this is unlikely to be a coincidence. The gap between the repetitions is 30. The next longest repetition, MOC, is three characters long and has a gap of 72. The greatest common divisor of 30 and 72 is 6. Of the 11 repetitions, six have gaps with a factor of 6. The only factors that occur more in the gaps are 2 (in eight gaps) and 3 (in seven gaps). As a first guess, let us try 6.

To verify that this is reasonable, we compute the index of coincidence for each alphabet. We first arrange the message into six columns.

A

D

Q

Y

S

M

I

U

S

B

O

X

K

K

T

M

I

B

H

K

I

Z

O

O

O

E

Q

O

O

G

I

F

B

A

G

K

A

U

M

F

V

V

T

A

A

C

I

D

T

W

M

O

C

I

O

E

Q

O

O

G

B

M

B

F

V

Z

G

G

W

P

C

I

E

K

Q

H

S

N

E

W

V

E

C

N

E

D

L

A

A

V

R

W

K

X

S

V

N

S

V

P

H

C

E

U

T

Q

O

I

O

F

M

E

G

J

S

W

T

P

C

H

A

J

M

O

C

H

I

U

I

X

  

Each column represents one alphabet. The indices of coincidence are as follows.

Alphabet #1:

IC = 0.069

Alphabet #4:

IC = 0.056

Alphabet #2:

IC = 0.078

Alphabet #5:

IC = 0.124

Alphabet #3:

IC = 0.078

Alphabet #6:

IC = 0.043

All indices of coincidence indicate a single alphabet except for the ICs associated with alphabets #4 (period between 1 and 2) and #6 (period between 5 and 10). Given the statistical nature of the measure, we will assume that these are skewed by the distribution of characters and proceed on the assumption that there are six alphabets, and hence a key of length 6.

Counting characters in each column (alphabet) yields:

Column

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

#1

3

1

0

0

4

0

1

1

3

0

1

0

0

1

3

0

0

1

1

2

0

0

0

0

0

0

#2

1

0

0

2

2

2

1

0

0

1

3

0

1

0

0

0

0

0

1

0

4

0

4

0

0

0

#3

1

2

0

0

0

0

0

0

2

0

1

1

4

0

0

0

4

0

1

3

0

2

1

0

0

0

#4

2

1

1

0

2

2

0

1

0

0

0

0

1

0

4

3

1

0

0

0

0

0

0

2

1

1

#5

1

0

5

0

0

0

2

1

2

0

0

0

0

0

5

0

0

0

3

0

0

2

0

0

0

0

#6

0

1

1

1

0

0

2

2

3

1

1

0

1

2

1

0

0

0

0

0

0

3

0

1

0

1

An unshifted alphabet has the following characteristics (L meaning low frequency, M meaning moderate frequency, and H meaning high frequency).

H

M

M

M

H

M

M

H

H

M

M

M

M

H

H

M

L

H

H

H

M

L

L

L

L

L

We now compare the frequency counts in the six alphabets above with the frequency count of the unshifted alphabet. The first alphabet matches the characteristics of the unshifted alphabet (note the values for A, E, and I in particular). Given the gap between B and I, the third alphabet seems to be shifted with I mapping to A. A similar gap occurs in the sixth alphabet between O and V, suggesting that V maps to A. Substituting into the ciphertext (bold letters are plaintext) produces

ADIYS

RIUKB

OCKKL

MIGHK

AZOTO

EIOOL

IFTAG

PAUEF

VATAS

CIITW

EOCNO

EIOOL

BMTFV

EGGOP

CNEKI

HSSEW

NECSE

DDAAA

RWCXS

ANSNP

HHEUL

QONOF

EEGOS

WLPCM

AJEOC

MIUAX

      

In the last line, the group AJE suggests the word ARE. Taking this as a hypothesis, the second alphabet maps A into S. Substituting back produces

ALIYS

RICKB

OCKSL

MIGHS

AZOTO

MIOOL

INTAG

PACEF

VATIS

CIITE

EOCNO

MIOOL

BUTFV

EGOOP

CNESI

HSSEE

NECSE

LDAAA

RECXS

ANANP

HHECL

QONON

EEGOS

ELPCM

AREOC

MICAX

      

The last block suggests MICAL, because AL is a common ending for adjectives. This means that the fourth alphabet maps O into A, and the cipher becomes

ALIMS

RICKP

OCKSL

AIGHS

ANOTO

MICOL

INTOG

PACET

VATIS

QIITE

ECCNO

MICOL

BUTTV

EGOOD

CNESI

VSSEE

NSCSE

LDOAA

RECLS

ANAND

HHECL

EONON

ESGOS

ELDCM

ARECC

MICAL

      

In English, a Q is always followed by a U, so the I in the second group of the second line must map to U. The fifth alphabet maps M to A. The cipher is solved:

ALIME

RICKP

ACKSL

AUGHS

ANATO

MICAL

INTOS

PACET

HATIS

QUITE

ECONO

MICAL

BUTTH

EGOOD

ONESI

VESEE

NSOSE

LDOMA

RECLE

ANAND

THECL

EANON

ESSOS

ELDOM

ARECO

MICAL

      

With proper spacing and punctuation, we have

A LIMERICK PACKS LAUGHS ANATOMICAL
INTO SPACE THAT IS QUITE ECONOMICAL
   BUT THE GOOD ONES I'VE SEEN
   SO SELDOM ARE CLEAN,
AND THE CLEAN ONES SO SELDOM ARE COMICAL.

The key is ASIMOV.

It is worth noting that the Vigenère cipher is easy to break by hand. However, the principles of attack hold for more complex ciphers that can be implemented only by computer. A good example is the encipherments that several older versions of WordPerfect used [79, 83]. These allowed a user to encipher a file with a password. Unfortunately, certain fields in the enciphered file contained information internal to WordPerfect, and these fields could be predicted. This allowed an attacker to derive the password used to encipher the file, and from that the plaintext file itself.

9.2.2.2 One-Time Pad

The one-time pad is a variant of the Vigenère cipher. The technique is the same. The key string is chosen at random, and is at least as long as the message, so it does not repeat. Technically, it is a threshold scheme [907], and is provably impossible to break [122] (see also Section 32.3.3, "Perfect Secrecy"). The implementation issues of the pad, including random generation of the key and key distribution, do not concern us here (although a later chapter will touch on them).

9.2.3 Data Encryption Standard

The Data Encryption Standard (DES) [743] was designed to encipher sensitive but nonclassified data. It is bit-oriented, unlike the other ciphers we have seen. It uses both transposition and substitution and for that reason is sometimes referred to as a product cipher. Its input, output, and key are each 64 bits long. The sets of 64 bits are referred to as blocks.

The cipher consists of 16 rounds, or iterations. Each round uses a separate key of 48 bits. These round keys are generated from the key block by dropping the parity bits (reducing the effective key size to 56 bits), permuting the bits, and extracting 48 bits. A different set of 48 bits is extracted for each of the 16 rounds (see Figure 9-5). If the order in which the round keys is used is reversed, the input is deciphered.

Figure 9-5. DES key schedule generation. PC-1 and PC-2 are permutation tables; LSH is a table of left shifts (rotations).

graphics/09fig05.gif

The rounds are executed sequentially, the input of one round being the output of the previous round. The right half of the input, and the round key, are run through a function f that produces 32 bits of output; that output is then xor'ed into the left half, and the resulting left and right halves are swapped (see Figure 9-6).

Figure 9-6. DES message encipherment and decipherment.

graphics/09fig06.gif

The function f provides the strength of the DES. The right half of the input (32 bits) is expanded to 48 bits, and this is xor'ed with the round key. The resulting 48 bits are split into eight sets of six bits each, and each set is put through a substitution table called the S-box. Each S-box produces four bits of output. They are catenated into a single 32-bit quantity, which is permuted. The resulting 32 bits constitute the output of the f function (see Figure 9-7).

Figure 9-7. The f function.

graphics/09fig07.gif

When the DES was first announced, it was criticized as too weak. First, Diffie and Hellman [296] argued that a key length of 56 bits was simply too short, and they designed a machine that could break a DES-enciphered message in a matter of days. Although their machine was beyond the technology of the time, they estimated that it could soon be built for about $20,000,000. Second, the reasons for many of the decisions in the design of the DES—most notably, those involving the S-boxes—were classified. Many speculated that the classification hid "trapdoors," or ways to invert the cipher without knowing the key.

Some properties of the DES were worrisome. First, it had four weak keys (keys that were their own inverses) and 12 semiweak keys (keys whose inverses were other keys). Second, let graphics/kbar.gif, graphics/mbar.gif, and graphics/cbar.gif be the complement of the key k, the plaintext m, and the ciphertext c, respectively. Let DESk(m) be the encipherment of plaintext m under key k. Then the complementation property states that

graphics/09equ01.gif


Third, some of the S-boxes exhibited irregular properties. The distribution of odd and even numbers was nonrandom, raising concerns that the DES did not randomize the input sufficiently. Several output bits of the fourth S-box seemed to depend on some of the output bits of the third S-box. This again suggested that there was a structure to the S-boxes, and because some of the design decisions underlying the S-boxes were unknown, the reasons for the structure were unknown. The structure made hardware implementation of the DES simpler [1005]. It distributed the dependence of each output bit on each input bit rapidly, so that after five rounds each output bit depended on every key and input bit [701]. It could have been needed to prevent the cipher from being broken easily. It also could enable a trapdoor to allow the cipher to be broken easily. There was considerable speculation that the NSA had weakened the algorithm, although a congressional investigation did not reflect this [63].

In 1990, a breakthrough in cryptanalysis answered many of these questions. Biham and Shamir applied a technique called differential cryptanalysis to the DES [96, 97, 98]. This technique required them to generate 247 pairs of chosen plaintext and ciphertext, considerably fewer than the trial-and-error approach others had used. During the development of this technique, they found several properties of the DES that appeared to answer some of the questions that had been raised.

First, for a known plaintext attack, differential cryptanalysis requires 256 plaintext and ciphertext pairs for a 15-round version of the DES. For the full 16 rounds, 258 known plaintext and ciphertext pairs are needed, which is more than sufficient for a trial-and-error approach. (Matsui subsequently improved this using a variant attack called linear cryptanalysis [663]; this attack requires 243 known plaintext and ciphertext pairs on the average.) Second, small changes in the S-boxes weakened the cipher (so that the required number of chosen plaintext and ciphertext pairs was reduced). Third, making every bit of the round keys independent (for an effective key length of 16 x 48 = 768 bits) did not make the DES resistant to differential cryptanalysis, which suggests that the designers of the DES knew about differential analysis. Coppersmith later confirmed this [234].

The DES is used in several modes [744]. Using it directly is called electronic code book (ECB) mode, and is very rare. Modes in which it can be used to generate a pseudo-one-time pad are cipher feed back (CFB) mode (see Section 11.2.1.2) and output feed back (OFB) mode (see Section 11.2.1.1). Its most common modes of use are cipher block chaining (CBC) mode (see Section 11.2.2), encrypt-decrypt-encrypt (EDE) mode, and triple DES mode (the EDE and triple DES modes are described in Section 11.2.2.1).

The CBC mode is an iterative mode in which a block of ciphertext depends not only on its input but also on the preceding ciphertext block. In addition to a 64-bit key, it requires a 64-bit initialization vector. Figure 9-8 shows this mode. It has the self-healing property. This property says that if one block of ciphertext is altered, the error propagates for at most two blocks. Figure 9-9 shows how a corrupted block affects others.

Figure 9-8. Cipher block chaining mode. The left diagram shows encipherment; each ciphertext is "fed back" into the cipher stream. The right diagram shows decipherment.

graphics/09fig08.gif

Figure 9-9. Example of the self-healing property. The ciphertext at the top was stored incorrectly (the italicized 4c should be 4b). Its decipherment is shown next, with the incorrect octets italicized. The plaintext used to create the ciphertext is shown at the bottom.

graphics/09fig09.gif

The EDE mode is used by many financial institutions. It requires two 64bit keys k and k'. The ciphertext c corresponding to some data m is c = DESk (DESk'–1(DESk(m))). Triple DES uses three keys k, k', and k'', and the second step is an encipherment, not a decipherment: c = DESk(DESk'(DESk"(m))).

In 1998, a design for a computer system and software that could break any DES-enciphered message in a few days was published [395]. This design complemented several challenges to break specific DES messages. Those challenges had been solved using computers distributed throughout the Internet. By 1999, it was clear that the DES no longer provided the same level of security as it had 10 years earlier, and the search was on for a new, stronger cipher (to be called the Advanced Encryption Standard, or AES) to fill the needs that the DES no longer filled.

The DES is one of the most important classical cryptosystems in the history of cryptography. It provided the impetus for many advances in the field and laid the theoretical and practical groundwork for many other ciphers. While analyzing it, researchers developed differential and linear cryptanalysis. Cryptographers developed other ciphers to avoid real, or perceived, weaknesses; cryptanalysts broke many of these ciphers and found weaknesses in others. Many of the features of the DES are used in other ciphers. Hence, even though it is nearing the end of its useful lifetime, it is well worth understanding.

In late 2001, the National Institute of Standards and Technology announced the selection of Rijndael as the Advanced Encryption Standard [754], the successor to the DES. Like the DES, the AES is a product cipher. Unlike the DES, the AES can use keys of 128, 192, or 256 bits and operates on blocks of 128 bits. It was specifically designed to withstand the attacks to which the DES showed weaknesses [254]. Time will show how rapidly it supplants the DES, but the lessons learned from the DES have borne fruit.

9.2.4 Other Classical Ciphers

Several algorithms have been proposed to overcome the weaknesses in the DES. NewDES (which, despite its name, is not a variant of DES but a new algorithm) has a block size of 64 bits and a key length of 120 bits [895]. However, it can be broken using an attack similar to differential cryptanalysis [888]. FEAL is another block cipher, with a block size of 64 bits and a key size of 64 bits [721, 915]. FEAL-4 (FEAL with four rounds) and FEAL-8 (FEAL with eight rounds) fell to differential cryptanalysis with 20 [738] and 10,000 [393] chosen plaintexts, respectively. Biham and Shamir broke FEAL-N, which uses N rounds, for N < 32 by differential crypt-analysis more quickly than by trial-and-error [97]. It was proposed that the key be lengthened to 128 bits, but the 128-bit key proved as easy to break as FEAL-N with the original 64-bit key. REDOC-II [252] has an 80-bit block and a 160-bit key. It has 10 rounds, and although a single round was successfully cryptanalyzed [95], the use of 10 rounds appears to withstand differential cryptanalysis.

LOKI89 [150], proposed as an alternative to the DES, was vulnerable to differential cryptanalysis [95]. Its successor, LOKI91 [151], uses a 64-bit key and a 64-bit block size. Differential cryptanalysis fails to break this cipher [577]. Khufu [699] has a block size of 64 bits and a key size of 512 bits. When used with 24 or 32 rounds, it resists chosen plaintext attacks. Its S-boxes are computed from the keys. Khafre [699], similar in design to Khufu, uses fixed S-boxes, but it has been broken [95].

IDEA is an eight-round cipher that uses 64-bit blocks and 128-bit keys [604]. It uses three operations: exclusive or's, addition modulo 216, and multiplication modulo 216 + 1. It appears to withstand known attacks but is too new for any definitive statement to be made about its security [888]. It is used in noncommercial software—notably, in the electronic mail program PGP [1073]—but is patented and requires licensing for use in commercial software.


  Previous section   Next section
Top