How Not to Send PDF Files
I’ve Got a PDF
An insurance company used to send me a paper letter (you know, the ancient way of communication) every year, right before the anniversary. Usually they offer some kind of change, like this:
Hey, we offer that you will pay x% more money from the next month,
in return we offer you x% more for the insurance amount.
And that’s perfectly fine. Really.
This year they sent me an email, with an attached PDF file. In the email they wrote:
We attached you the anniversary documents.
The files is protected with a password, which is the birth date of the insured personed, in the format of DDMMYYY.
Yea… my first thought was…
why do they even bother to add the password.
My next thought was:
what if…
The Email Problem
The main problem is that the email is not signed. There is no cryptographic signature attached, so I’m not sure if it is what they really sent. I’m not even sure if they sent me anything.
The PDF Problem
The PDF is not signed too.
The Pathetic Encryption
The PDF encryption is just way too easy to crack. The date format of DDMMYYYY
gives between 365 and 366 possible keys for each year. Assuming that the insured person is was born between 1940 and 2019 (so is between 0 and 78 years old).
This means that we have to check less than 29k of possible passwords.
I’m not sure yet what’s the speed of decrypting the pdf files, but let’s assume it’s 1 second per password. This means that the maximum time of cracking is 8 hours.
Let’s Decrypt
For decrypting the pdf I took the first program that I found information about. It was qpdf. The syntax for decrypting a pdf file and creating a new one without the password is:
qpdf --password=ABC --decrypt input.pdf output.pdf
When the password is bad, the program returns 2
and prints out error information:
λ /tmp/pdf/ qpdf --password=ABC --decrypt input.pdf output.pdf
input.pdf: invalid password
λ /tmp/pdf/ echo $?
2
So now it’s enough to generate all the passwords.
When I got all the information, I wrote this simple python script:
from datetime import datetime as dt
from datetime import timedelta
from subprocess import call
inf = "input.pdf"
outf = "output.pdf"
START_DATE = dt.strptime('01011940', '%d%m%Y').date()
END_DATE = dt.strptime('01012019', '%d%m%Y').date()
d = START_DATE
while(d <> END_DATE):
print("{}".format(d))
password = d.strftime("%d%m%Y")
res = call("qpdf" + " --password={} --decrypt {} {}".format(
password, inf, outf), shell=True)
if res == 0:
print("Done, password is {}".format(password))
break
d = d + timedelta(days=1)
The script output is:
λ /tmp/pdf/ time python decode.py
1940-01-01
input.pdf: invalid password
1940-01-02
input.pdf: invalid password
...many lines later...
1979-10-10
Done, password is 10101979
python decode.py 37,72s user 12,19s system 103% cpu 48,148 total
The decoded file without any password is now in the output.pdf
.
Final Remarks
- If you have a company and you send an email with some serious documents, please, sign the email.
- Don’t send the PDFs “encrypted” with such a password.
- Better: don’t send such PDFs at all, have a website with at least two factor authentication.
- If you really have to send a PDF, sign the email, sign the PDF, encrypt it with long password sent using another communication channel (the insurance company has my phone number, sending a message to my phone with the password would be a much better idea.)
- The average decryption speed was about 400 passwords per second.
- Writing the script with searching for the information took me 3 minutes, so the total time of decrypting such a professionally “encrypted” file was below 5 minutes.
For The Curious
- I know, there is something wrong with me. Instead of just writing a password, I cracked it. Heh, I have a decrypted pdf and I still haven’t entered the password.
- No, the 10-10-1979 is not my birth day, or anyone that I know of, but
10101979
is a nice prime number.