# Cryptography/Breaking Vigenère cipher

Plain text is encrypted using the Vigenère cipher by first choosing a keyword consisting of letters from the alphabet of symbols used in the plain text. The keyword is then used to encrypt the text by way of the following example.

Using: Plain text: I Like A Book and choosing: Keyword: cta

1. Map all the plain text to numbers 0-25 or however long your alphabet is

```  ilikewikibooks converts to 8 11 8 10 4 22 8 10 8 1 14 14 10 18
```

2. Map your keyword to numbers the same way

```  cta maps to 2 19 0
```

3. add your key to your plain text in the following manner

```  8  11  8  10   4  22  8  10  8  1  14  14  10  18
2  19  0   2  19   0  2  19  0  2  19   0   2  19
resulting in
10 30  8  12  23  22 10 29   8  3  33   14  12  37
```

4. take each resulting number mod 26 ( or for the general case mod the number of characters in your alphabet)

```  resulting in
10 4   8  12  23  22 10 3    8  3  7    14  12   11
```

5. map each number back to a letter to get the resulting cypher text

```  keimxwkdidhoml
```

The message can easily be decrypted with the keyword by reversing the above process. The keyword can be any length equal to or less than that of the plain text.

Without the keyword the primary method of breaking the Vigenère cipher is known as the Kasiski test, after the Prussian major who first published it. The first stage is determining the length of the keyword.

### Determining the key length

Given an enciphered message such as:

```Plaintext:  TOBEORNOTTOBE
Keyword:    KEYKEYKEYKEYK
Ciphertext: DSZOSPXSRDSZO
```

Upon inspection of the ciphertext, we see that there are a few digraphs repeated, namely DS, SZ, and ZO. It is statistically unlikely that all of these would arise by random chance; the odds are that repeated digraphs in the ciphertext correspond to repetitions in the plaintext. If that is the case, the digraphs must be encoded by the same section of the key both times. Therefore, the length of the key is a factor of the distance in the text between the repetitions.

Digraph First Position Second Position Distance Factors
DS 1 10 9 3
SZ 1 10 9 3
ZO 1 10 9 3

The common factors (indeed, the only factors in this simple example) are 3 and 9. This narrows down the possibilities significantly, and the effect is even more pronounced with longer texts and keys.

### Frequency analysis

Once the length of the key is known, a slightly modified frequency analysis technique can be applied. Suppose the length of the key is known to be three. Then every third letter will be encrypted with the same letter of the key. The ciphertext can be split into three segments - one for each key letter—and the procedure described for the Caesar cipher can be used.