Simple Requests Interface

Author: Szymon Lipiński
Published at: 2016-10-25

The requests is quite a good library, it simplifies using the ugly and terrible standard Python HTTP libraries. Today I found a much nicer interface even to the requests itself.

The Basic Requests Usage

The main requests interface which I use is simple:

import requests as r

def download_data(url):
    res = r.get(url)
    if res.status_code == 200:
        # this is OK
    else:
        # there was an error

Can you spot an error there?

Example of Requests Errors

The requests library also raises some exceptions, which are not catched there.

Missing Server

When calling a URL for which there is no server responding:

res = r.get('http://xxx.simononsoftware.com')

you will get an exception:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: /nopage (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7faeecc75860>: Failed to establish a new connection: [Errno -2] Name or service not known',))

Bad URL

When calling a URL with missing resource:

res = r.get('http://www.simononsoftware.com/nopage')

you will get no exception, however the status_code will not be 200:

print(res.status_code)
404

So it would be nice to have one unified way of getting the error.

A Beter Requests Usage

So to catch the errors, the better code is:

import requests as r

def download_data(url):
    try:
        res = r.get(url)
    except:
        # do something on exception

    if res.status_code == 200:
        # this is OK
    else:
        # there was an error

The Return Value

However I also need to store in a database all the results of the HTTP fetches, especially the error information. So I need to check for the errors. Some of the errors are caused by the libraries that requests uses, some are returned by the HTTP server. So I need to get all of them, let’s store them in an object of a class like this one:

from collections import namedtuple

Response = namedtuple('Response', ['data', 'error'])

The two fields data, and error are exclusive, only one of them can be filled, the other will be None.

I have added the return value to the above function:

import requests as r
from collections import namedtuple

Response = namedtuple('Response', ['data', 'error'])

def download_data(url):
    try:
        res = r.get(url)
    except Exception as e:
        return Response(data=None, error=e)

    if res.status_code == 200:
        return Response(data=res, error=None)
    else:
        return Response(data=None, error=res.reason)

I think the above function is ugly, just too many checks, and returns. It would be great to have just two returns: one for the failure, one for the success.

The Unified Interface

After digging through the requests code and documentation, it turned out that all the exceptions raise by the library are in the requests.exceptions module, and inherit from the RequestException class. This class has fields containing the request, and the response objects (if there was any response).

The response object also contains a function raise_for_status() which raises an exception when the status code was different than 200.

Let’s use all this information, and the result code looks like this:

import requests as r
from collections import namedtuple
from requests.exceptions import RequestException

Response = namedtuple('Response', ['data', 'error'])

def download_data(url):
    """Returns Response object with data, or error information.
    """
    try:
        res = r.get(url)
        res.raise_for_status()
        return Response(data=res.text, error=None)
    except RequestException as e:
        return Response(data=None, error=str(e))

And then when using the requests library for good and bad URLs:

for url in ['http://www.simononsoftware.com/nopage',
            'http://xxx.simononsoftware.com',
            'htttt://example.com',
            'http://www.simononsoftware.com']:
    print(url)
    print(download_data(url))

I get:

http://www.simononsoftware.com/nopage
Response(data=None, error='404 Client Error: Not Found for url: http://www.simononsoftware.com/nopage')
http://xxx.simononsoftware.com
Response(data=None, error="HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.
HTTPConnection object at 0x7f3215aecb38>: Failed to establish a new connection: [Errno -2] Name or service not known',))")
htttt://example.com
Response(data=None, error="No connection adapters were found for 'htttt://example.com'")
http://www.simononsoftware.com
Response(data='<HUGE HTML HERE'>, error=None")
The comments are disabled. If you want to write something to me, you can use e.g. Twitter.