Hash length extension attack example#

A length extension attack allows an attacker that intercepted a hash of a message H(message) to produce H(message || extension) without knowing the original message. If the hash is used as a basic Message Authentication Code by prepending a secret key H(key || message), this attack can be used to generate a valid MAC for a new message message || pad || extension, where pad is a especially crafted string and extension is completely attacker-controlled.

An example of this attack on the MD5 hash function can be found in this article. A slightly modified version can be found below to play with.

First, we need an implementation of MD5. The article and this example use the pymd5 module below, as it exposes some internal methods that are useful to carry out this attack.

(If you’re reading this in the documentation, the module code has been hidden as it’s quite long. The full version of this notebook can be found in the source code of the documentation.)

Then we define a password and a message to illustrate this.

[2]:
# The secret password used to validate the message.
password = "super secret password"
# A super important message.
message = "Bring two croissants and a brioche"

We also compute our legitimate MAC.

[3]:
mac = md5(password + message).digest()

Now we define a function to test if a digest matches a given message.

[4]:
def verify(msg: bytes, digest: bytes) -> bool:
    """Checks that the digest is valid for the given message.

    Args:
        msg: The received message.
        digest: The accompanying digest that should correspond to
            MD5(password || msg).

    Returns:
        True if the digest is valid for the given message, i.e.
        MD5(password || msg) == digest.
    """
    h = md5()
    h.update(password.encode() + msg)
    return h.digest() == digest

It’s time to implement the attack.

We first need to know the length of password || message. In this case we assume that we know the actual message sent and the length of the password (but not the password itself). If we didn’t know the length, we would have to test different lengths until one passes verify.

input_len = password_len + len(message)

From this, we can deduce how many bits have already been processed by the algorithm. This is the length of the message plus the length of the padding that was added by the algorithm. The pymd5 implementation exposes the padding function that returns the padding that is added depending on the length of input.

processed_bits = (input_len + len(padding(input_len * 8))) * 8

We can now initialize a new md5 object using the known state and the number of processed bits:

h = md5(state=digest, count=processed_bits)

We update this new instance with the message of our choice and produce the new digest, which is our new MAC:

h.update(b"and 42 pains au chocolat")
new_mac = h.digest()

Since we know the original message, we compute the new message that corresponds to our new digest:

new_message = message.encode() + padding(input_len * 8) + b"and 42 pains au chocolat"

Putting this all together gives us:

[5]:
def attack(message: str, password_len: int, mac: bytes) -> tuple[bytes, bytes]:
    """Implements a hash length extension attack on MD5.

    Args:
        message: The known message to extend.
        password_len: The length in bytes of the secret password.
        mac: The intercepted MAC == MD5(password || message).

    Returns:
        A tuple containing the new message and the new digest.
    """
    input_len = password_len + len(message)
    processed_bits = (input_len + len(padding(input_len * 8))) * 8

    h = md5(state=mac, count=processed_bits)

    h.update(b"and 42 pains au chocolat")

    new_mac = h.digest()

    new_message = (
        message.encode() + padding(input_len * 8) + b"and 42 pains au chocolat"
    )

    return new_message, new_mac

We can now test the attack:

[6]:
password_length = len(password)

new_message, new_mac = attack(message, password_length, mac)
if verify(new_message, new_mac):
    print("Success!")
    print(f"{message = }")
    print(f"{mac.hex() = }")
    print(f"{new_message = }")
    print(f"{new_mac.hex() = }")
else:
    print("Failed!")
Success!
message = 'Bring two croissants and a brioche'
mac.hex() = '356177536596b1dda0e9767180683aea'
new_message = b'Bring two croissants and a brioche\x80\xb8\x01\x00\x00\x00\x00\x00\x00and 42 pains au chocolat'
new_mac.hex() = 'ce6c189011a7a8508a59e7f33104d401'

We see that while the new message has some “garbage” in the middle, it passes the verification! Hopefully the receiver will think that it’s just a display bug and fulfils our order. ;)

Side-note: while the introduction says that knowledge of the message is not required, we did use it to compute the new message to verify. This is because of the example we used to illustrate this attack that emulates a possible use of this construction. Without knowledge of the message, we can still craft a valid hash as long as we know/can guess the length of the password and the message, but then the receiver in our example would have no way of verifying this. :)