Manual for the Solution of Military Ciphers
CHAPTER II
PRINCIPLES OF MECHANISM OF A WRITTEN LANGUAGE
With a few exceptions, notably Chinese, all modern languages are constructed of words which in turn are formed from letters. In any given language the number of letters, and their conventional order is fixed. Thus English is written with 26 letters and their conventional order is A, B, C, D, E, etc. Some letters are used very frequently and others rarely. In fact, if ten thousand consecutive letters of a text be counted and the frequency of occurrence of each letter be noted, the numbers found will be practically identical with those obtained from any other text of ten thousand letters in the same language. The relative proportion of occurrence of the various letters will also hold approximately for even very short texts.
Such a count of a large number of letters, when it is put in the form of a table, is known as a frequency table. Every language has its own distinctive frequency table and, for any given language, the frequency table is almost as fixed as the alphabet. There are minor differences in frequency tables prepared from texts on special subjects. For example, if the text be newspaper matter, the frequency table will differ slightly from one prepared from military orders and will also differ slightly from one prepared from telegraph messages. But these differences are very slight as compared with the differences between the frequency tables of two different languages.
Again there is a fixed ratio of occurrence of every letter with every other for any language and this, put in table form, constitutes a table of frequency of digraphs. In the same way a table of trigraphs, showing the ratio of occurrence of any three letters in sequence, could be prepared, but such a table would be very extensive and a count of the more common three letter combinations is usually used.
Other tables, such as frequency of initial and final letters of words, might be of value but the common practice is to put cipher text into groups of five or ten letters each and eliminate word forms. This is almost a necessity in telegraphic and radio communication to enable the receiving operator to check correct receipt of a message. He must get five letters, neither more nor less, per word or he is sure a mistake has been made. There is little difficulty, as a rule, in restoring word forms in the deciphered message.
We will now take up, in order, the various frequency tables and linguistic peculiarities of English and Spanish. Frequency tables for French, German, and Italian for single letters will follow. All frequency tables have been re-calculated from at least ten thousand letters of text and compared with existing tables. No marked difference has been found in any case between the re-calculated tables and those already in use.
Data for Solution of Ciphers in English
Table I.--Normal frequency table. Frequency for ten thousand letters and for two hundred letters. This latter is put in graphic form and is necessarily an approximation. Taken from military orders and reports, English text.
10,000 Letters 200 Letters
A 778 16 1111111111111111 B 141 3 111 C 296 6 111111 D 402 8 11111111 E 1277 26 11111111111111111111111111 F 197 4 1111 G 174 3 111 H 595 12 111111111111 I 667 13 1111111111111 J 51 1 1 K 74 2 11 L 372 7 1111111 M 288 6 111111 N 686 14 11111111111111 O 807 16 1111111111111111 P 223 4 1111 Q 8 R 651 13 1111111111111 S 622 12 111111111111 T 855 17 11111111111111111 U 308 6 111111 V 112 2 11 W 176 3 111 X 27 Y 196 4 1111 Z 17
Vowels AEIOU = 38.37%; consonants LNRST = 31.86%; consonants JKQXZ = 1.77%.
The vowels may be safely taken as 40%, consonants LNRST as 30% and consonants JKQXZ as 2%.
Order of letters: E T O A N I R S H D L U C M P F Y W G B V K J X Z Q.
Table II.--Frequency table for telegraph messages, English text. This table varies slightly from the standard frequency table because the common word "the" is rarely used in telegrams and there is a tendency to use longer and less common words in preparing telegraph messages.
10,000 Letters 200 Letters
A 813 16 1111111111111111 B 149 3 111 C 306 6 111111 D 417 8 11111111 E 1319 26 11111111111111111111111111 F 205 4 1111 G 201 4 1111 H 386 8 11111111 I 711 14 11111111111111 J 42 1 1 K 88 2 11 L 392 8 11111111 M 273 6 111111 N 718 14 11111111111111 O 844 17 11111111111111111 P 243 5 11111 Q 38 1 1 R 677 14 11111111111111 S 656 13 1111111111111 T 634 13 1111111111111 U 321 6 111111 V 136 3 111 W 166 3 111 X 51 1 1 Y 208 4 1111 Z 6
In this table the vowels AEIOU = 40.08%, consonants LNRST = 30.77% and consonants JKQXZ = 2.25%.
Orders of letters: E O A N I R S T D L H U C M P Y F G W B V K X J Q Z.
Table III.--Table of frequency of digraphs, duals or pairs (English). This table was prepared from 20,000 letters, but the figures shown are on the basis of 2,000 letters. For this reason they are, to a certain extent, approximate; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal line at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus in two thousand letters we may expect to find AH once and HA twenty-six times.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A 1 7 10 22 3 2 26 4 2 2 7 8 11 2 9 13 12 9 2 4 1 12 B 5 1 2 1 1 1 1 2 2 1 3 1 C 6 1 1 14 2 11 11 3 2 3 1 1 1 1 D 6 12 30 1 2 4 30 1 4 1 1 1 1 3 E 11 14 16 12 2 6 33 10 2 6 18 14 12 1 7 36 11 12 2 16 5 1 1 F 3 2 8 2 1 2 2 1 3 25 3 1 1 1 G 4 1 3 2 11 2 3 1 H 1 11 2 4 1 4 1 2 1 1 2 10 50 3 2 I 2 1 4 12 6 5 1 12 1 5 9 8 12 1 3 12 13 22 2 3 6 1 1 J 1 K 1 1 2 2 1 1 L 14 6 2 1 6 1 1 1 6 9 3 6 3 3 2 3 5 M 7 3 13 2 2 3 4 1 10 4 1 1 2 N 38 3 25 2 1 31 3 2 2 39 4 3 11 2 O 1 1 12 4 8 8 3 12 18 2 4 7 8 3 7 13 15 22 2 6 1 5 P 2 1 8 1 2 4 2 3 2 1 8 1 4 3 1 Q 2 1 1 1 R 16 1 3 3 40 3 6 2 6 1 2 1 25 8 2 2 8 11 2 S 16 1 3 25 1 2 17 1 2 1 12 7 2 9 11 6 11 1 6 T 25 1 3 12 13 5 2 3 20 2 1 24 8 2 16 20 11 6 2 2 7 U 1 2 1 6 1 3 2 2 3 3 1 17 1 5 3 5 5 1 V 3 1 5 5 3 2 5 1 W 1 2 8 1 1 1 1 2 4 2 3 3 X 1 4 2 1 1 Y 3 2 2 4 1 1 8 1 2 1 3 1 7 Z 1 1 1
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Table IV.--Order of frequency of common pairs to be expected in a count of 2,000 letters of military or semi-military English text. (Based on a count of 20,000 letters).
TH 50 AT 25 ST 20 ER 40 EN 25 IO 18 ON 39 ES 25 LE 18 AN 38 OF 25 IS 17 RE 36 OR 25 OU 17 HE 33 NT 24 AR 16 IN 31 EA 22 AS 16 ED 30 TI 22 DE 16 ND 30 TO 22 RT 16 HA 26 IT 20 VE 16
Table V.--Table of recurrence of groups of three letters to be expected in a count of 10,000 letters of English text.
THE 89 TIO 33 EDT 27 AND 54 FOR 33 TIS 25 THA 47 NDE 31 OFT 23 ENT 39 HAS 28 STH 21 ION 36 NCE 27 MEN 20
Table VI.--Table of frequency of occurrence of letters as initials and finals of English words. Based on a count of 4,000 words; this table gives the figures for an average 100 words and is necessarily an approximation, like Table III. English words are derived from so many sources that it is not impossible for any letter to occur as an initial or final of a word, although Q, X and Z are rare as initials and B, I, J, Q, V, X and Z are rare as finals.
Letters A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Initial 9 6 6 5 2 4 2 3 3 1 1 2 4 2 10 2 - 4 5 17 2 - 7 - 3 - Final 1 - - 10 17 6 4 2 - - 1 6 1 9 4 1 - 8 9 11 1 - 1 - 8 -
It is practically impossible to find five consecutive letters in an English text without a vowel and we may expect from one to three with two as the general average. In any twenty letters we may expect to find from 6 to 9 vowels with 8 as an average. Among themselves the relative frequency of occurrence of each of the vowels, (including Y when a vowel) is as follows:
A, 19.5% E, 32.0% I, 16.7% O, 20.2% U, 8.0% Y, 3.6%
The foregoing tables give all the essential facts about the mechanism of the English language from the standpoint of the solution of ciphers. The use to be made of these tables will be evident when the solution of different types of ciphers is taken up.
Data for the Solution of Ciphers in Spanish
The Spanish language is written with the following alphabet:
A B C CH D E F G H I J L LL M N Ñ O P Q R RR S T U V X Y Z
while the exact sense often depends upon the use of accents over the vowels. However, in cipher work it is exceedingly inconvenient to use the permanent digraphs, CH, LL and RR and they do not appear as such in any specimens of Spanish or Mexican cipher examined. Accented vowels and Ñ are also not found and we may, in general, say that a cipher whose text is Spanish will be prepared with the following alphabet:
A B C D E F G H I J L M N O P Q R S T U V X Y Z
and the receiver must supply the accents and the tilde over the N to conform to the general sense.
However, many Mexican cipher alphabets contain the letters K and W. This is particularly true of the ciphers in use by secret service agents who must be prepared to handle words like NEW YORK, WILSON and WASHINGTON. The letters K and W will, however, have a negligible frequency except in short messages where words like these occur more than once.
In this connection, if a cipher contains Mexican geographical names like CHIHUAHUA, MEXICO, MUZQUIZ, the letters H, X and Z will have a somewhat exaggerated frequency.
In Spanish, the letter Q is always followed by U and the U is always followed by one of the other vowels, A, E, I or O. As QUE or QUI occurs not infrequently in Spanish text, particularly in telegraphic correspondence, it is well worth noting that, if a Q occurs in a transposition cipher, we must connect it with U and another vowel. The clue to several transposition ciphers has been found from this simple relation.
Table VII.--Normal frequency table for military orders and reports, calculated on a basis of 10,000 letters of Spanish text. The graphic form is on a basis of 200 letters.
10,000 Letters 200 Letters
A 1352 27 111111111111111111111111111 B 102 2 11 C 474 9 111111111 D 524 10 1111111111 E 1402 28 1111111111111111111111111111 F 91 2 11 G 137 3 111 H 102 2 11 I 606 12 111111111111 J 41 1 1 L 517 10 1111111111 M 300 6 111111 N 619 12 111111111111 O 818 16 1111111111111111 P 257 5 11111 Q 87 2 11 R 751 15 111111111111111 S 724 14 11111111111111 T 422 8 11111111 U 387 7 1111111 V 85 2 11 X 6 Y 103 2 11 Z 42 1 1
In this table the vowels AEIOU = 45.65%; consonants LNRST = 30.33%; consonants JKQXZ = 1.76%.
Order of letters:
E A O R S N I D L C T U M P G Y (BH) F Q V Z J X.
Table VIII.--Table of frequency of digraphs, duals or pairs, Spanish text. Like Table III, this table is on the basis of 2,000 letters although prepared from a count of 20,000 letters. For this reason it is, to a certain extent an approximation; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal lines at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus, in two thousand letters, we may expect to find AI twice and IA twenty-three times.
A B C D E F G H I J L M N O P Q R S T U V X Y Z
A 9 4 19 11 5 6 17 23 54 18 9 3 20 29 11 21 8 6 2 5 A B 6 3 1 4 B C 24 6 6 24 5 3 8 8 9 5 2 2 C D 31 29 3 19 13 10 9 4 D E 12 2 6 59 10 1 5 7 2 12 18 22 4 9 38 25 28 25 3 3 E F 4 4 4 3 3 1 F G 2 4 8 4 2 G H 2 12 10 2 1 H I 2 23 16 5 2 3 11 13 6 10 5 3 I J 3 2 1 J L 21 3 6 39 3 3 7 21 5 6 12 2 2 L M 12 6 5 1 6 15 7 2 6 1 M N 32 46 2 8 32 12 2 N O 26 22 2 6 3 4 9 16 2 8 20 15 7 11 O P 13 3 2 4 9 2 7 4 11 P Q 11 5 1 2 3 1 Q R 40 27 2 4 4 36 3 11 17 3 R S 39 52 10 7 14 2 14 3 S T 5 13 4 4 18 5 6 30 T U 2 4 2 6 3 4 5 2 6 4 17 15 2 1 U V 2 2 2 2 2 2 V X X Y 5 6 2 5 2 2 Y Z 1 2 1 4 2 Z
A B C D E F G H I J L M N O P Q R S T U V X Y Z
Table IX.--Order of frequency of common pairs to be expected in a count of 2,000 letters of Spanish military orders and reports. Based on Table VIII.
DE 59 ON 32 AC 24 LA 54 AD 31 EC 24 ES 52 ST 30 CI 23 EN 46 ED 29 IA 23 AR 40 RA 29 DO 22 AS 39 TE 28 NE 22 EL 39 ER 27 AL 21 RE 38 CO 26 LL 21 OR 36 SE 25 PA 20 AN 32 UE 25 PO 20
Alphabetic Frequency Tables
(Truesdell)
Frequency of occurrence in 1,000 letters of text:
Letter French German Italian Portuguese A 80 52 117 140 B 6 18 6 6 C 33 31 45 34 D 40 51 31 40 E 197 173 126 142 F 9 21 10 12 G 7 42 17 10 H 6 41 6 10 I 65 81 114 59 J 3 1 [1] 5 K [1] 10 [1] L 49 28 72 32 M 31 20 30 46 N 79 120 66 48 O 57 28 93 110 P 32 8 30 28 Q 12 [1] 3 16 R 74 69 64 64 S 66 57 49 88 T 65 60 60 43 U 62 51 29 46 V 21 9 20 15 W [1] 15 X 3 [1] [1] 1 Y 2 [1] [1] 1 Z 1 14 12 4
Order of Frequency
French
E A N R S I U O L D C P M V Q F G B J Y Z T H X
German
E N I R T S A D G H C L F M B W Z K V P J Q X Y U O
Italian
E A I O L N R T S C D M U V G Z F B Q P H
Portuguese
E A O S R I N M T D C L P Q V F G B J Z X Y U H
Graphic Frequency Tables
Frequency of occurrence in 200 letters of text.
French
A 16 1111111111111111 B 2 11 C 6 111111 D 10 1111111111 E 39 111111111111111111111111111111111111111 F 2 11 G 1 1 H 1 1 I 13 1111111111111 J 1 1 K L 10 1111111111 M 6 111111 N 16 1111111111111111 O 11 11111111111 P 6 111111 Q 2 11 R 15 111111111111111 S 13 1111111111111 T 13 1111111111111 U 12 111111111111 V 4 1111 W X 1 1 Y Z
Italian
A 23 11111111111111111111111 B 1 1 C 9 111111111 D 6 111111 E 25 1111111111111111111111111 F 2 11 G 3 111 H 1 1 I 23 11111111111111111111111 L 14 11111111111111 M 6 111111 N 13 1111111111111 O 19 1111111111111111111 P 6 111111 Q R 13 1111111111111 S 10 1111111111 T 12 111111111111 U 6 111111 V 4 1111 X Y Z 2 11
German
A 10 1111111111 B 4 1111 C 6 111111 D 10 1111111111 E 32 11111111111111111111111111111111 F 4 1111 G 8 11111111 H 8 11111111 I 16 1111111111111111 J K 2 11 L 6 111111 M 4 1111 N 24 111111111111111111111111 O 6 111111 P 2 11 Q R 14 11111111111111 S 11 11111111111 T 12 111111111111 U 10 1111111111 V 2 11 W 3 111 X Y Z 3 111
Portuguese
A 28 1111111111111111111111111111 B 1 1 C 7 1111111 D 8 11111111 E 28 1111111111111111111111111111 F 2 11 G 2 11 H 2 11 I 12 111111111111 J 1 1 L 6 111111 M 9 111111111 N 10 1111111111 O 22 1111111111111111111111 P 6 111111 Q 3 111 R 13 1111111111111 S 18 111111111111111111 T 9 111111111 U 9 111111111 V 3 111 X Y Z 1 1