How Not to Send PDF Files

by Szymon Lipiński
tags: programming

I’ve Got a PDF

An insurance company used to send me a paper letter (you know, the ancient way of communication) every year, right before the anniversary. Usually they offer some kind of change, like this:

Hey, we offer that you will pay x% more money from the next month,

in return we offer you x% more for the insurance amount.

And that’s perfectly fine. Really.

This year they sent me an email, with an attached PDF file. In the email they wrote:

We attached you the anniversary documents.

The files is protected with a password, which is the birth date of the insured personed, in the format of DDMMYYY.

Yea… my first thought was…

why do they even bother to add the password.

My next thought was:

what if…

The Email Problem

The main problem is that the email is not signed. There is no cryptographic signature attached, so I’m not sure if it is what they really sent. I’m not even sure if they sent me anything.

The PDF Problem

The PDF is not signed too.

The Pathetic Encryption

The PDF encryption is just way too easy to crack. The date format of DDMMYYYY gives between 365 and 366 possible keys for each year. Assuming that the insured person is was born between 1940 and 2019 (so is between 0 and 78 years old).

This means that we have to check less than 29k of possible passwords.

I’m not sure yet what’s the speed of decrypting the pdf files, but let’s assume it’s 1 second per password. This means that the maximum time of cracking is 8 hours.

Let’s Decrypt

For decrypting the pdf I took the first program that I found information about. It was qpdf. The syntax for decrypting a pdf file and creating a new one without the password is:

qpdf --password=ABC --decrypt input.pdf output.pdf

When the password is bad, the program returns 2 and prints out error information:

λ /tmp/pdf/ qpdf --password=ABC --decrypt input.pdf output.pdf
input.pdf: invalid password
λ /tmp/pdf/ echo $?
2

So now it’s enough to generate all the passwords.

When I got all the information, I wrote this simple python script:

from datetime import datetime as dt
from datetime import timedelta
from subprocess import call


inf = "input.pdf"
outf = "output.pdf"

START_DATE = dt.strptime('01011940', '%d%m%Y').date()
END_DATE = dt.strptime('01012019', '%d%m%Y').date()

d = START_DATE

while(d <> END_DATE):
    print("{}".format(d))
    password = d.strftime("%d%m%Y")
    res = call("qpdf" + " --password={} --decrypt {} {}".format(
            password, inf, outf), shell=True)
    if res == 0:
        print("Done, password is {}".format(password))
        break
    d = d + timedelta(days=1)

The script output is:

λ /tmp/pdf/ time python decode.py            
1940-01-01
input.pdf: invalid password
1940-01-02
input.pdf: invalid password

...many lines later...

1979-10-10
Done, password is 10101979
python decode.py  37,72s user 12,19s system 103% cpu 48,148 total

The decoded file without any password is now in the output.pdf.

Final Remarks

For The Curious