A Simpler Requests Interface

Tue, Oct 25, 2016

by Szymon Lipiński

tags: python programming

The Requests is quite a good Python library. It simplifies using the ugly and terrible standard Python HTTP libraries. The more I think about the interfaces, the more I think that there is lots of places to improve including the Requests library.

The Basic Requests Usage

The main requests interface which I use is simple:

import requests as r

def download_data(url):
    res = r.get(url)
    if res.status_code() == 200:
        # this is OK
    else:
        # there was an error

Can you spot an error there?

Getting The Requests Errors

The requests library also raises some exceptions, which are not caught there.

Missing The Server

When calling a URL for which there is no server responding:

res = r.get('http://xxx.simononsoftware.com')

you will get an exception:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: /nopage (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7faeecc75860>: Failed to establish a new connection: [Errno -2] Name or service not known',))

A Bad URL

When calling a URL with missing resource:

res = r.get('http://www.simononsoftware.com/nopage')

you will get no exception, however the status_code will not be 200:

print(res.status_code)
404

So it would be nice to have one unified way of getting the error.

Catching Errors

To catch all the errors we should check both: exceptions and status_code:

import requests as r

def download_data(url):
    try:
        res = r.get(url)
    except:
        # do something on exception

    if res.status_code == 200:
        # this is OK
    else:
        # there was an error

The Return Value

I also need to store in a database all the results of the HTTP fetches, especially the error information. So I need to check for the errors. Some of the errors are caused by the libraries that requests uses. Some are returned by the HTTP server. I need to get all of them. Let’s store them in an object of a class like this one:

from collections import namedtuple

Response = namedtuple('Response', ['data', 'error'])

The two fields: the data and the error are exclusive, only one of them can be filled, the other will be None.

I have added the return value to the above function:

import requests as r
from collections import namedtuple

Response = namedtuple('Response', ['data', 'error'])

def download_data(url):
    try:
        res = r.get(url)
    except Exception as e:
        return Response(data=None, error=e)

    if res.status_code == 200:
        return Response(data=res, error=None)
    else:
        return Response(data=None, error=res.reason)

I think the above function is ugly. Too many checks and returns. It would be great to have just two returns: one for the failure, one for the success.

The Unified Interface

After digging through the requests code and documentation, it turned out that all the exceptions raised by the library are in the requests.exceptions module and inherit from the RequestException class. This class has fields containing the request and the response objects (if there was any response).

The response object also contains a function raise_for_status(), which raises an exception when the status code is different than 200.

Let’s use all this information to change the download_data function:

import requests as r
from collections import namedtuple
from requests.exceptions import RequestException

Response = namedtuple('Response', ['data', 'error'])

def download_data(url):
    """Returns Response object with data, or error information.
    """
    try:
        res = r.get(url)
        res.raise_for_status()
        return Response(data=res.text, error=None)
    except RequestException as e:
        return Response(data=None, error=str(e))

This way when using the requests library for good and bad URLs:

for url in ['http://www.simononsoftware.com/nopage',
            'http://xxx.simononsoftware.com',
            'htttt://example.com',
            'http://www.simononsoftware.com']:
    print(url)
    print(download_data(url))

I get:

http://www.simononsoftware.com/nopage
Response(data=None, error='404 Client Error: Not Found for url: http://www.simononsoftware.com/nopage')
http://xxx.simononsoftware.com
Response(data=None, error="HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.
HTTPConnection object at 0x7f3215aecb38>: Failed to establish a new connection: [Errno -2] Name or service not known',))")
htttt://example.com
Response(data=None, error="No connection adapters were found for 'htttt://example.com'")
http://www.simononsoftware.com
Response(data='<HUGE HTML HERE'>, error=None")