hale.cryptopals

0.1.0-SNAPSHOT

Cryptopals challenges in Clojure. CircleCI

A collection of 48 exercises that demonstrate attacks on real-world crypto.

This is a different way to learn about crypto than taking a class or reading a book. We give you problems to solve. They're derived from weaknesses in real-world systems and modern cryptographic constructions. We give you enough info to learn about the underlying crypto concepts yourself. When you're finished, you'll not only have learned a good deal about how cryptosystems are built, but you'll also understand how they're attacked.

Work in progress:

~~Start~~ 24th May 2018
~~Set 1~~ 17th June 2018
Set 2
Set 3
Set 4
Set 5
Set 6
Set 7
Set 8

dependencies

org.clojure/clojure

1.9.0

(this space intentionally left almost blank)

hale.cryptopals.set1

toc

Crypto Challenge Set 1

This is the qualifying set. We picked the exercises in it to ramp developers up gradually into coding cryptography, but also to verify that we were working with people who were ready to write code.

This set is relatively easy. With one exception, most of these exercises should take only a couple minutes. But don't beat yourself up if it takes longer than that. It took Alex two weeks to get through the set!

If you've written any crypto code in the past, you're going to feel like skipping a lot of this. Don't skip them. At least two of them (we won't say which) are important stepping stones to later attacks.

(ns hale.cryptopals.set1)

hale.cryptopals.set1.challenge1

toc

Base64 encode a hex string

The string:

49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

Should produce:

SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

So go ahead and make that happen. You'll need to use this code for the rest of the exercises.

Cryptopals Rule: Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing.

(ns hale.cryptopals.set1.challenge1
  (:require [clojure.set :refer [map-invert]]
            [clojure.string :as str]
            [hale.cryptopals.utils :as utils]
            [clojure.test :as t]))

Axiomatic Base64 map of six-bit bytes (indices) to ASCII characters

The following table visually explains the process of Base64 encoding:

Get the underlying bytestream of the input data. (In the table they are encoding the ASCII word "Man" wherease the exercise has us encode hex)
Parse the bytestream in chunks of six and turn each six-bit sequence into a byte by padding with 2 empty bits (zeros). You can see in the image how the remaining two bits from the first byte are used to generate the next byte, etc until all input bytes are consumed.
Use the decimal value of each six-bit byte to 'lookup' the ASCII char in the map. Because there is only six bits of information in each 'byte', the bit pattern will always resolve to one of the 64 characters in the Base64 encoding map. This is because there are only 64 possible patterns in six bits of information (n bits yield 2^n patterns).

I had originally thought to write an algorithm that processed the list recusively in blocks of 6. However, this is not possible because the smallest unit in the JVM is a Byte. So instead we must process the bytestream in groups of four and use bitwise operators to shift the bits around in each ecimal strings as bytes is in the utils namespace, as it's something we'll be using a lot in these exercises.

(def base64-map
  {  0 \A  1 \B  2 \C  3 \D  4 \E  5 \F  6 \G  7 \H
     8 \I  9 \J 10 \K 11 \L 12 \M 13 \N 14 \O 15 \P
    16 \Q 17 \R 18 \S 19 \T 20 \U 21 \V 22 \W 23 \X
    24 \Y 25 \Z 26 \a 27 \b 28 \c 29 \d 30 \e 31 \f
    32 \g 33 \h 34 \i 35 \j 36 \k 37 \l 38 \m 39 \n
    40 \o 41 \p 42 \q 43 \r 44 \s 45 \t 46 \u 47 \v
    48 \w 49 \x 50 \y 51 \z 52 \0 53 \1 54 \2 55 \3
    56 \4 57 \5 58 \6 59 \7 60 \8 61 \9 62 \+ 63 \/ })

Base64 byte-expansion procedure, to reduce the number of possible values per byte from 256 to 64.

(defn base64-expand-bytes
  ([b1]
   (let [c1 (bit-shift-right b1 2)
         c2 (bit-or (bit-shift-left (bit-and 2r00000011 b1) 4)
                    (byte 0))]
     [c1 c2 nil nil]))
  ([b1 b2]
   (let [c1 (bit-shift-right b1 2)
         c2 (bit-or (bit-shift-left (bit-and 2r00000011 b1) 4)
                    (bit-shift-right b2 4))
         c3 (bit-or (bit-shift-left (bit-and 2r00001111 b2) 2)
                    (byte 0))]
     [c1 c2 c3 nil]))
  ([b1 b2 b3]
   (let [c1 (bit-shift-right b1 2)                             ; first 6 bits of b1
         c2 (bit-or (bit-shift-left (bit-and 2r00000011 b1) 4) ; last 2 bits of b1 plus first 4 bits of b2
                    (bit-shift-right b2 4))
         c3 (bit-or (bit-shift-left (bit-and 2r00001111 b2) 2) ; last 4 bits of b2 plus first 2 bits of b3
                    (bit-shift-right b3 6))
         c4 (bit-and 2r00111111 b3)]                           ; last 6 bits of b3
   [c1 c2 c3 c4])))

(t/deftest test-base64-expand-bytes
  (t/is (= (apply base64-expand-bytes [2r01001101 2r01100001 2r01101110])
           [2r00010011 2r00010110 2r00000101 2r00101110] [19 22 5 46]))
  (t/is (= (apply base64-expand-bytes (map byte [77 97 110]))
           (map byte [19 22 5 46])))
  (t/is (= (apply base64-expand-bytes (map byte [77 0 0]))
           (map byte [19 16 0 0])))
  (t/is (= (apply base64-expand-bytes (map byte [77 97 0]))
           (map byte [19 22 4 0]))))

Base64 encode a stream of bytes

(defn base64-encode
  [bytes]
  (let [triples (partition 3 3 [] bytes)
        quads   (map (partial apply base64-expand-bytes) triples)
        chars   (map (fn [k] (get base64-map k \=)) (flatten quads))]
    (apply str chars)))

(def base64-encode-str (comp base64-encode utils/str-to-bytes))

Test vectors from RFC4648

(t/deftest test-base64-encode
  (t/is (=          (base64-encode-str )))
  (t/is (= "Zg=="     (base64-encode-str "f")))
  (t/is (= "Zm8="     (base64-encode-str "fo")))
  (t/is (= "Zm9v"     (base64-encode-str "foo")))
  (t/is (= "Zm9vYg==" (base64-encode-str "foob")))
  (t/is (= "Zm9vYmE=" (base64-encode-str "fooba")))
  (t/is (= "Zm9vYmFy" (base64-encode-str "foobar"))))

Set 1 :: Challenge 1 :: Base64 encode a hex string

(def hex-to-base64
  (comp base64-encode utils/hex-to-bytes))

(t/deftest test-hex-to-base64
  (t/is (= (hex-to-base64
            (str "49276d206b696c6c696e6720796f757220627261696e206c"
                 "696b65206120706f69736f6e6f7573206d757368726f6f6d"))
            "SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t")))

Decode

Decoding reads from the above table backwards:

Reverse lookup the Base64 map to get the index.
Remove the 2 bits of padding from each byte until we end up with regular 8-bit bytes again.

(def base64-chars (set (vals base64-map)))

Base64 reinflation procedure, to remove the zero-padding and turn the sequence of six-bit bytes back in to regular 8-bit bytes.

(defn base64-reduce-bytes
  ([b1 b2]
   (let [c1 (bit-or (bit-shift-left b1 2)
                    (bit-shift-right b2 4))]
     [c1]))
  ([b1 b2 b3]
   (let [c1 (bit-or (bit-shift-left b1 2)
                    (bit-shift-right b2 4))
         c2 (bit-or (bit-shift-left (bit-and 2r1111 b2) 4)
                    (bit-shift-right b3 2))]
     [c1 c2]))
  ([b1 b2 b3 b4]
   (let [c1 (bit-or (bit-shift-left b1 2)                  ; b1 shifted to make room
                    (bit-shift-right b2 4))                ; first 2 bits of b2
         c2 (bit-or (bit-shift-left (bit-and 2r1111 b2) 4) ; last 4 bits of b2
                    (bit-shift-right b3 2))                ; first 4 bits of b3
         c3 (bit-or (bit-shift-left (bit-and 2r11 b3) 6)   ; last 4 bits of b3
                    b4)]                                   ; b4
     [c1 c2 c3])))

(t/deftest test-base64-reduce-bytes
  (t/is (= (apply base64-reduce-bytes [2r00010011 2r00010110 2r00000101 2r00101110])
         [2r01001101 2r01100001 2r01101110])))

Decode a base64 encoded string (or bytestream)

(defn base64-decode
  [str]
  (let [chars     (map char str)
        sanitized (filter (partial contains? base64-chars) chars)
        indices   (map (map-invert base64-map) sanitized)
        quads     (partition 4 4 [] (map byte indices))
        triples   (map (partial apply base64-reduce-bytes) quads)]
    (flatten triples)))

(t/deftest test-base64-decode-long
  (t/is (= (str "49276d206b696c6c696e6720796f757220627261696e206c"
                "696b65206120706f69736f6e6f7573206d757368726f6f6d")
           (utils/bytes-to-hex
            (base64-decode
             "SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t")))))

(def base64-to-str (comp utils/bytes-to-str base64-decode))

Test vectors from RFC4648

(t/deftest test-base64-decode
  (t/is (=        (base64-to-str )))
  (t/is (= "f"      (base64-to-str "Zg==")))
  (t/is (= "fo"     (base64-to-str "Zm8=")))
  (t/is (= "foo"    (base64-to-str "Zm9v")))
  (t/is (= "fooba"  (base64-to-str "Zm9vYmE=")))
  (t/is (= "foobar" (base64-to-str "Zm9vYmFy")))
  (t/is (= "foob"   (base64-to-str "Zm9vYg=="))))

hale.cryptopals.set1.challenge2

toc

Fixed XOR

Write a function that takes two equal-length buffers and produces their XOR combination.

If your function works properly, then when you feed it the string:

1c0111001f010100061a024b53535009181c

... after hex decoding, and when XOR'd against:

686974207468652062756c6c277320657965

... should produce:

746865206b696420646f6e277420706c6179

(ns hale.cryptopals.set1.challenge2
  (:require [hale.cryptopals.utils :as utils]
            [clojure.test :as t]
            [clojure.string :as str]))

XORs two equal length bytestreams

(defn xor-combine
  [h1 h2]
  (let [b1    (utils/hex-to-bytes h1)
        b2    (utils/hex-to-bytes h2)
        xored (map bit-xor b1 b2)
        hexes (map (partial format "%x") xored)]
    (apply str hexes)))

(t/deftest test-xor-combine
  (t/is (= (xor-combine "1c0111001f010100061a024b53535009181c"
                      "686974207468652062756c6c277320657965")
         "746865206b696420646f6e277420706c6179")))

hale.cryptopals.set1.challenge3

toc

Single-byte XOR cipher

The hex encoded string:

1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736

...has been XOR'd against a single character. Find the key, decrypt the message.

You can do this by hand. But don't: write code to do it for you.

How? Devise some method for "scoring" a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.

(ns hale.cryptopals.set1.challenge3
  (:require [clojure.test :as t]
            [clojure.string :as str]
            [clojure.set :refer [difference]]
            [hale.cryptopals.utils :as utils]
            [hale.cryptopals.set1.challenge1 :as base64]))

DEPRECATED: Measures 'englishness' of a string by the absence of 'weird' chars

I completed this challenge initially by measuring the simple quantity of weird characters in the output. I had the Base64 vals already to hand, so used that as by definition of normalcy.

(defn str-englishness-weirdness
  [str]
  (let [eng-chars       (vals base64/base64-map)
        str-chars       (seq (char-array str))
        non-eng-chars   (difference (set eng-chars) (set str-chars))
        p-non-eng-chars (/ (count non-eng-chars) (count (eng-chars)))]
    (- 1 p-non-eng-chars)))

...this doesn't work very well. Sentences containing spaces in particular are penalized, whereas jumbmles of ASCII are considered all equally English. Back to the drawing board (what was that they said about frequencies...?)

DEPRECATED: Measures 'englishness' of a string by propotion of word chars

Still ignoring the advice about frequencies, I wrote this which looks at the proportion of word characters (this time including space!). This was good enough to pass Challenge 3.

(defn str-englishness-words
  [str]
  (let [eng-chars (count (re-seq #"[a-zA-Z ]" str))]
    (/ eng-chars (count str))))

Frequencies!

After a bit of research, I stumbled across the Chi-squared test as a way of measuring a set of observations against expected outcomes.

The test takes two sets of observations (actual and expected), and reduces the differences between them to a single number.

(defn chi-square [expected actual]
  (/ (utils/square (- actual expected)) expected))

Reduces the difference between two sets of observations to a single number. Lower numbers indicate greater convergence.

Note that this takes a third argument for the 'default' value if nothing was observed. This is not actually supported by the underlying statistic. In fact, the test is unreliable when the number of observations is less than five. However, it works for our purposes if we assign a default small value for missing occurrences.

(defn chi-square-distance
  ([e-freqs a-freqs] (chi-square-distance e-freqs a-freqs 0.000001))
  ([e-freqs a-freqs missing]
  (let [distances (map (fn [[a-key a-val]] (chi-square (get e-freqs a-key missing) a-val)) a-freqs)]
    (reduce + distances))))

Clojure has a built in frequencies function, which we can use to turn the string under test into a map of actual character observations.

Measures the similarity between a string and a target corpus of string. 'Does the string 'fit in' with the other strings?

(defn chi-sq-str-fit
  [rel-freq str]
  (let [chars   (char-array (str/lower-case str))
        e-freqs (utils/map-vals (partial * (count chars)) rel-freq)
        a-freqs (frequencies chars)]
    (* -1 (chi-square-distance e-freqs a-freqs))))

Relative character frequencies in the English language are well known. Here is one such map from Wikipedia:

Source: Wikipedia/Letter_frequency

(def eng-char-rel-freq
  { \a 0.08167 \b 0.01492 \c 0.02782 \d 0.04253
   \e 0.12702 \f 0.02228 \g 0.02015 \h 0.06094
   \i 0.06966 \j 0.00153 \k 0.00772 \l 0.04025
   \m 0.02406 \n 0.06749 \o 0.07507 \p 0.01929
   \q 0.00095 \r 0.05987 \s 0.06327 \t 0.09056
   \u 0.02758 \v 0.00978 \w 0.02360 \x 0.00150
   \y 0.01974 \z 0.00074 })

(def str-englishness (partial chi-sq-str-fit eng-char-rel-freq))

There is one problem with the above approach: it ignores punctiation. It turns out the plaintext of many of these solutions contains lots of spaces, enough that when it was reused in later challenges the wrong result was sometimes returned.

How can we include \space and other commmon chars in the frequency map?

We could juggle the numbers or find another map that includes punctuation, but I thought it would be fun to produce my own, since we have this handy frequencies function:

Reads text files in data/training-text returning a single string with every file concatonated. _This project is not distributed with any training data; please supply your own by pasting text files in data/training-text_

(def training-text
  (let [contents  (file-seq (clojure.java.io/file "./data/training-text"))
        filenames (filter #(.isFile %) contents)]
    (str/join (map slurp filenames))))

Given some text and some allowed characters from that text, returns a map of the relative frequency of each character.

(defn custom-char-rel-freq
  ([text] (custom-char-rel-freq text #"[a-z ']"))
  ([text regex]
   (let [sanitized (str/lower-case text)
         chars     (map #(char (first %)) (re-seq regex sanitized))
         freqs     (frequencies chars)
         rel-freqs (utils/map-vals (fn [v] (/ v (count chars))) freqs)]
     rel-freqs)))

(def trained-char-rel-freq (custom-char-rel-freq training-text))

(def str-fitness (partial chi-sq-str-fit trained-char-rel-freq))

What shall we use for training data? Well given this quote:

An appreciation for early-90's MTV hip-hop can't hurt either.

...let's create a frequency map of Vanilla Ice lyrics. The results are as follows:

Expected relative frequencies of each letter in a Vanilla Ice lyric

(def vanilla-ice-rel-freq
  { \a 0.06212 \b 0.01350 \c 0.02465 \d 0.02556
    \e 0.08375 \f 0.01177 \g 0.01883 \h 0.03980
    \i 0.06460 \j 0.00380 \k 0.01630 \l 0.03510
    \m 0.02448 \n 0.05429 \o 0.07037 \p 0.01495
    \q 0.00032 \r 0.03612 \s 0.04152 \t 0.07494
    \u 0.02708 \v 0.00732 \w 0.01795 \x 0.00162
    \y 0.02644 \z 0.00150 \' 0.01737 \space 0.18394 })

Measures the likelyhood of a string being a Vanilla Ice lyric, using a chi-squared test against a relative character frequency map trained on lyrics.

(def str-iciness
  (partial chi-sq-str-fit vanilla-ice-rel-freq))

(t/deftest test-str-iciness
  (let [icy "Now that the party is jumping\n"
        eng "I have of late, but wherefore I know not, lost all my mirth"]
    (t/is (> (str-iciness icy) (str-iciness eng)))))

XORs a bytestream against the given char

(defn single-char-xor
  [bytestream char]
  (let [b1 bytestream
        b2 (map byte (repeat char))]
    (map bit-xor b1 b2)))

XOR the bytstream against the char then evaluate its fitness

(defn xor-with-score
  ([bytestream char] (xor-with-score bytestream char str-iciness))
  ([bytestream char fitness-fn]
  (let [xored (single-char-xor bytestream char)]
    {:in bytestream
     :out xored
     :char char
     :score (fitness-fn (utils/bytes-to-str xored))})))

(defn decode-single-char-xor
  [bytestream]
  (let [candidates (map (partial xor-with-score bytestream) utils/printable-ascii-chars)
        sorted     (sort-by :score candidates)
        winner     (last sorted)]
    winner))

Single-byte XOR cipher

(defn decode-single-char-xor-encoded-hex-str
  [str]
  (let [bytes  (utils/hex-to-bytes str)
        winner (decode-single-char-xor bytes)]
    (-> winner
        (update :out utils/bytes-to-str)
        (assoc :in str))))

(t/deftest test-decode-single-char-xor-encoded-hex-str
  (let [hex-str "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"
        result  (decode-single-char-xor-encoded-hex-str hex-str)]
    (t/is (= (:out result) "Cooking MC's like a pound of bacon"))
    (t/is (= (:char result) \X))))

hale.cryptopals.set1.challenge4

toc

Detect single-character XOR

One of the 60 character strings in this file has been encrypted by single-character XOR. Find it.

(Your code from #3 should help.)

(ns hale.cryptopals.set1.challenge4
  (:require [hale.cryptopals.set1.challenge3 :as challenge3]
            [clojure.test :as t]
            [clojure.string :as str]))

From a seq of hex strings, finds and decodes the one which has been encrypted by single-character XOR

(defn detect-single-char-xor
  [strings]
  (last (sort-by :score (map challenge3/decode-single-char-xor-encoded-hex-str strings))))

(t/deftest test-detect-single-char-xor
  (let [strings (str/split (slurp "data/s1c4.txt") #"\s")
        result  (detect-single-char-xor strings)]
    (t/is (= (:in result) "7b5a4215415d544115415d5015455447414c155c46155f4058455c5b523f"))
    (t/is (= (:out result) "Now that the party is jumping\n"))
    (t/is (= (:char result) \5))))

hale.cryptopals.set1.challenge5

toc

Implement repeating-key XOR

Here is the opening stanza of an important work of the English language:

Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal

Encrypt it, under the key "ICE", using repeating-key XOR.

In repeating-key XOR, you'll sequentially apply each byte of the key; the first byte of plaintext will be XOR'd against I, the next C, the next E, then I again for the 4th byte, and so on.

It should come out to:

0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272
a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f

Encrypt a bunch of stuff using your repeating-key XOR function. Encrypt your mail. Encrypt your password file. Your .sig file. Get a feel for it. I promise, we aren't wasting your time with this.

(ns hale.cryptopals.set1.challenge5
  (:require [clojure.test :as t]
            [hale.cryptopals.utils :as utils]))

XOR-encode an input string using the given key

(defn repeating-key-xor
  [key str]
  (let [b1    (utils/str-to-bytes str)
        b2    (flatten (repeat (map byte (char-array key))))
        xored (map bit-xor b1 b2)]
    (utils/bytes-to-hex xored)))

(t/deftest test-repeating-key-xor
  (let [input "Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal"
        key   "ICE"
        result (repeating-key-xor key input)]
    (t/is (= result
             (str "0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a262263"
                  "24272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b2028"
                  "3165286326302e27282f")))))

hale.cryptopals.set1.challenge6

toc

Break repeating-key XOR

It is officially on, now. This challenge isn't conceptually hard, but it involves actual error-prone coding. The other challenges in this set are there to bring you up to speed. This one is there to qualify you. If you can do this one, you're probably just fine up to Set 6.

There's a file here. It's been base64'd after being encrypted with repeating-key XOR.

Decrypt it.

Here's how:

Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between this is a test and wokka wokka!!! is 37. Make sure your code agrees before you proceed.
For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
Solve each block as if it was single-character XOR. You already have code to do this.
For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.

This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR ("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people "know how" to break it than can actually break it, and a similar technique breaks something much more important.

(ns hale.cryptopals.set1.challenge6
  (:require [clojure.test :as t]
            [clojure.string :as str]
            [hale.cryptopals.utils :as utils]
            [hale.cryptopals.set1.challenge1 :as base64]
            [hale.cryptopals.set1.challenge3 :as challenge3]))

1. 2. 3. 4. Find the key size

(defn- edit-distance-bytes
  [b1 b2]
  (let [xored      (map bit-xor b1 b2)
        bit-counts (map #(Integer/bitCount %) xored)]
    (reduce + bit-counts)))

Calculate the edit distance (number of differing bits) between two strings

(defn edit-distance
  [s1 s2]
  (let [b1 (utils/str-to-bytes s1)
        b2 (utils/str-to-bytes s2)]
    (edit-distance-bytes b1 b2)))

(t/deftest test-edit-distance
  (t/is (= (edit-distance "this is a test" "wokka wokka!!!") 37)))

Score a given keysize based on the hamming-distance of its application against the bytestream

(defn evaluate-key-size
  [bytes ks]
  (let [byte-pairs (partition 2 (partition ks bytes))
        scores     (flatten (map #(apply edit-distance-bytes %) byte-pairs))
        avg-score  (/ (apply + scores) (count scores))
        normalized (/ avg-score ks)]
    {:score normalized :ks ks}))

(defn determine-key-size
  [bytes n]
  (let [candidates (range 2 n)
        scores (map #(evaluate-key-size bytes %) candidates)]
    (apply min-key :score scores)))

(t/deftest test-determine-key-size
  (t/is (= 29 (:ks (determine-key-size (base64/base64-decode (slurp "data/s1c6.txt")) 50)))))

5. 6. 7. 8. Find the key

(defn transpose [m] (apply mapv vector m))

Given bytes encrypted with repeating-key XOR, find the key

(defn find-repeating-xor-key
  [bs]
  (let [keysize   (:ks (determine-key-size bs 50))
        ;; _ (println (str "best guess key size is " keysize " (tried up to 50)"))
        blocks    (transpose (partition keysize bs))
        decrypted (map (partial challenge3/decode-single-char-xor) blocks)]
    (apply str (map :char decrypted))))

(t/deftest test-find-repeating-xor-key
  (t/is (= "Terminator X: Bring the noise"
         (find-repeating-xor-key (base64/base64-decode (slurp "data/s1c6.txt"))))))

Decodes input bytestream by guessing the key.

(defn decrypt-repeating-key-xor
  [bs1]
  (let [key   (find-repeating-xor-key bs1)
        bs2   (flatten (repeat (map byte (char-array key))))
        xored (map bit-xor bs1 bs2)]
    (utils/bytes-to-str xored)))

Break repeating-key XOR

(def decrypt-repeating-key-xor-base64
  (comp decrypt-repeating-key-xor base64/base64-decode))

(t/deftest test-decrypt-repeating-key-xor
  (let [ciphertext (slurp "data/s1c6.txt")
        e-lines (str/split (slurp "data/s1c6.solution.txt") #"\n")
        a-lines (str/split (decrypt-repeating-key-xor-base64 ciphertext) #"\n")]
    (t/is (every? (fn [[expected actual]] (= expected actual)) (zipmap e-lines a-lines)))))

hale.cryptopals.set1.challenge7

toc

AES in ECB mode

The Base64-encoded content in this file has been encrypted via AES-128 in ECB mode under the key:

"YELLOW SUBMARINE".

(case-sensitive, without the quotes; exactly 16 characters; I like "YELLOW SUBMARINE" because it's exactly 16 bytes long, and now you do too).

Decrypt it. You know the key, after all.

Easiest way: use OpenSSL::Cipher and give it AES-128-ECB as the cipher.

Do this with code. You can obviously decrypt this using the OpenSSL command-line tool, but we're having you get ECB working in code for a reason. You'll need it a lot later on, and not just for attacking ECB.

(ns hale.cryptopals.set1.challenge7
  (:require [hale.cryptopals.utils :as utils]
            [hale.cryptopals.set1.challenge1 :as base64]
            [clojure.test :as t])
  (:import [javax.crypto Cipher]
           [javax.crypto.spec SecretKeySpec]))

(defn decrypt-aes-ecb
  [key byte-stream]
  (let [key-spec  (SecretKeySpec. (.getBytes key) "AES")
        cipher    (Cipher/getInstance "AES/ECB/NoPadding")
        _         (.init cipher (int Cipher/DECRYPT_MODE) key-spec)
        _ (println (count byte-stream))
        decrypted (.doFinal cipher (byte-array byte-stream))]
    decrypted))

(defn decrypt-aes-ecb-base64 [key b64]
  (decrypt-aes-ecb key (base64/base64-decode b64)))

Decrypt AES in ECB mode

(def decrypt-aes-ecb-base64-to-str
  (comp utils/bytes-to-str decrypt-aes-ecb-base64))

(t/deftest test-decrypt-aes-ecb
  (let [key "YELLOW SUBMARINE"
        ciphertext (slurp "data/s1c7.txt")
        plaintext (slurp "data/s1c7.solution.txt")]
    (t/is (= plaintext (decrypt-aes-ecb-base64-to-str key ciphertext)))))

hale.cryptopals.set1.challenge8

toc

Detect AES in ECB mode.

In this file are a bunch of hex-encoded ciphertexts. One of them has been encrypted with ECB. Detect it.

Remember that the problem with ECB is that it is stateless and deterministic; the same 16 byte plaintext block will always produce the same 16 byte ciphertext.

(ns hale.cryptopals.set1.challenge8
  (:require [hale.cryptopals.set1.challenge3 :refer [chi-square-distance]]
            [hale.cryptopals.utils :as utils]
            [clojure.string :as str]
            [clojure.test :as t]))

Cyphertext known to be encoded with AES in ECB mode will exhibit repeating blocks of bytes (if the plaintext has any repetition). A good heuristic for AES in ECB mode would therefore be 'contains repetition'.

Info about the source data:

64 strings to test
Each string is 60 chars long (hex encoded, so that's 30 bytes)

We already have that chi-squared function which can measure the deviation of observations from what is expected.

The opposite of repetition is randomness. Randomness can be defined here as 'each byte is equaly likey to appear'. I.e. the bytes should be uniformally distributed. The least-random ciphertext will therefore be the set of bytes that is most deviant from the uniform distribution (where every outcome is equally likely).

Standard uniform distribution for a set of bytes

(def uniform-bytes-rel-freq-map
  (zipmap (map byte (range 0 128))
          (repeat 128 (/ 1 128))))

Set 1 :: Challenge 8 :: Detect AES in ECB mode

(defn detect-aes-in-ecb-mode
  [strs]
  (let [streams (map utils/hex-to-bytes strs)
        e-freqs (utils/map-vals (partial * 30) uniform-bytes-rel-freq-map)
        score   (fn [bs] (chi-square-distance e-freqs (frequencies bs) (/ 1 256)))
        scores  (map #(hash-map :in % :score (score %)) streams)
        winner  (apply max-key :score scores)]
    (utils/bytes-to-hex (:in winner))))

(t/deftest test-detect-aes-in-ecb-mode
  (let [strs (str/split (slurp "data/s1c8.txt") #"\n")
        expected (nth strs 132)
        actual (detect-aes-in-ecb-mode strs)]
    (t/is (= expected actual))))

hale.cryptopals.set2

toc

Crypto Challenge Set 2

This is the first of several sets on block cipher cryptography. This is bread-and-butter crypto, the kind you'll see implemented in most web software that does crypto.

This set is relatively easy. People that clear set 1 tend to clear set 2 somewhat quickly.

Three of the challenges in this set are extremely valuable in breaking real-world crypto; one allows you to decrypt messages encrypted in the default mode of AES, and the other two allow you to rewrite messages encrypted in the most popular modes of AES.

(ns hale.cryptopals.set2)

hale.cryptopals.set2.challenge9

toc

Implement PKCS#7 padding

A block cipher transforms a fixed-sized block (usually 8 or 16 bytes) of plaintext into ciphertext. But we almost never want to transform a single block; we encrypt irregularly-sized messages.

One way we account for irregularly-sized messages is by padding, creating a plaintext that is an even multiple of the blocksize. The most popular padding scheme is called PKCS#7.

So: pad any block to a specific block length, by appending the number of bytes of padding to the end of the block. For instance,

"YELLOW SUBMARINE"

... padded to 20 bytes would be:

"YELLOW SUBMARINE\x04\x04\x04\x04"

(ns hale.cryptopals.set2.challenge9
  (:require [clojure.test :as t]
            [hale.cryptopals.utils :as utils]))

(defn pad-block
  ([block] (pad-block block 16 0x04))
  ([block blocksize] (pad-block block blocksize 0x04))
  ([block blocksize pad]
  (let [pad-length (- blocksize (count block))]
    (flatten (conj (vec block) (repeat pad-length pad))))))

(t/deftest test-pad-block
  (t/is (= (map char [\Y \E \L \L \O \W \space \S \U \B
                       \M \A \R \I \N \E 0x04 0x04 0x04 0x04])
           (map char (pad-block (utils/str-to-bytes "YELLOW SUBMARINE") 20)))))

hale.cryptopals.set2.challenge10

toc

Implement CBC mode

CBC mode is a block cipher mode that allows us to encrypt irregularly-sized messages, despite the fact that a block cipher natively only transforms individual blocks.

In CBC mode, each ciphertext block is added to the next plaintext block before the next call to the cipher core.

The first plaintext block, which has no associated previous ciphertext block, is added to a "fake 0th ciphertext block" called the initialization vector, or IV.

Implement CBC mode by hand by taking the ECB function you wrote earlier, making it encrypt instead of decrypt (verify this by decrypting whatever you encrypt to test), and using your XOR function from the previous exercise to combine them.

The file here is intelligible (somewhat) when CBC decrypted against "YELLOW SUBMARINE" with an IV of all ASCII 0 (\x00\x00\x00 &c)

(ns hale.cryptopals.set2.challenge10
  (:require [hale.cryptopals.utils :as utils]
            [hale.cryptopals.set1.challenge1 :as base64]
            [hale.cryptopals.set1.challenge7 :refer [decrypt-aes-ecb-base64-to-str]]
            [clojure.test :as t])
  (:import [javax.crypto Cipher]
           [javax.crypto.spec SecretKeySpec]))

(defn encrypt-aes-ecb
  [key bs]
  (let [key-spec  (SecretKeySpec. (.getBytes key) "AES")
        cipher    (Cipher/getInstance "AES/ECB/NoPadding")
        _         (.init cipher (int Cipher/ENCRYPT_MODE) key-spec)
        encrypted (.doFinal cipher (byte-array bs))]
    encrypted))

(defn encrypt-aes-ecb-str-to-base64
  [key str]
  (->> str
       utils/str-to-bytes
       ((partial encrypt-aes-ecb key))
       base64/base64-encode))

(let [key "ICEICEBABY123456" plaintext "YELLOW SUBMARINE" encrypted (encrypt-aes-ecb-str-to-base64 key plaintext) _ (println encrypted) _ (println (count encrypted)) decrypted (decrypt-aes-ecb-base64-to-str key encrypted) ])

(decrypt-aes-ecb-base64-to-str encrypted key)

(t/deftest encrypt-aes-in-ecb-mode
  (let [key "YELLOW SUBMARINE"
        ciphertext (slurp "data/s1c7.txt")
        plaintext (slurp "data/s1c7.solution.txt")]
    (t/is (= (encrypt-aes-ecb-str-to-base64 key plaintext) ciphertext))))

I found this image on Wikipedia helpful for understanding how CBC encryption works:

So we need a function that takes an IV + plaintext and returns ciphertext.

bs -- bytestream input plaintext iv -- initialization vector blocks -- number of bytes per block

(defn encrypt-aes-ecb
  [bs iv blocks])

(def input (base64/base64-decode (slurp "data/s2c10.txt")))

hale.cryptopals.utils

toc

(ns hale.cryptopals.utils
  (:require [clojure.test :as t]
            [clojure.string :as str]))

FIXME: UNSAFE -- interprets string as hex using read-string

(defn hex-to-bytes
  [hex] (let [chars (partition 2 hex)
              parse-chars (fn [[c1 c2]] (read-string (str "0x" c1 c2)))]
          (map parse-chars chars)))

(t/deftest test-hex-to-bytes
  (t/is (= (hex-to-bytes "4d616e")
         (map byte [0x4d 0x61 0x6e])
         (map byte [77 97 110])
         (map byte [2r1001101 2r1100001 2r1101110]))))

Given a list of bytes, return a string representation of those bytes in binary form (ones-and-zeros)

(defn print-bytes
  [byte & rest]
  (let [stringify (fn [b]      (Integer/toBinaryString b))
        pad       (fn [string] (format "%8s" string))
        zero-pad  (fn [string] (str/replace string " " "0"))]
    (map (comp zero-pad pad stringify) (conj rest byte))))

(t/deftest test-print-bytes
  (t/is (= (apply print-bytes (hex-to-bytes "4d616e"))
         ["01001101" "01100001" "01101110"])))

Also known as 'unhexify'

(defn hex-to-str
  [hex]
  (let [bytes (hex-to-bytes hex)
        chars (map char bytes)]
    (apply str chars)))

(defn bytes-to-str [bytes] (apply str (map char bytes)))

(defn map-vals [f m]
  (into {} (for [[k v] m] [k (f v)])))

Source: https://en.wikipedia.org/wiki/ASCII#Printable_characters

(def printable-ascii-chars
  (map char (range 32 127)))

(defn square [x] (* x x))

(defn str-to-bytes
  [str]
  (let [chars (map char str)]
    (map byte chars)))

(defn bytes-to-hex
  [bytes]
  (apply str (map (partial format "%02x") bytes)))