Simple Requests Interface
The requests is quite a good library, it simplifies using the ugly and terrible standard Python HTTP libraries. Today I found a much nicer interface even to the requests itself.
The Basic Requests Usage
The main requests interface which I use is simple:
import requests as r def download_data(url): res = r.get(url) if res.status_code == 200: # this is OK else: # there was an error
Can you spot an error there?
Example of Requests Errors
The requests library also raises some exceptions, which are not catched there.
When calling a URL for which there is no server responding:
res = r.get('http://xxx.simononsoftware.com')
you will get an exception:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: /nopage (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7faeecc75860>: Failed to establish a new connection: [Errno -2] Name or service not known',))
When calling a URL with missing resource:
res = r.get('http://www.simononsoftware.com/nopage')
you will get no exception, however the
status_code will not be
So it would be nice to have one unified way of getting the error.
A Beter Requests Usage
So to catch the errors, the better code is:
import requests as r def download_data(url): try: res = r.get(url) except: # do something on exception if res.status_code == 200: # this is OK else: # there was an error
The Return Value
However I also need to store in a database all the results of the HTTP fetches, especially the error information. So I need to check for the errors. Some of the errors are caused by the libraries that requests uses, some are returned by the HTTP server. So I need to get all of them, let’s store them in an object of a class like this one:
from collections import namedtuple Response = namedtuple('Response', ['data', 'error'])
The two fields
error are exclusive, only one of them can be filled, the other will be
I have added the return value to the above function:
import requests as r from collections import namedtuple Response = namedtuple('Response', ['data', 'error']) def download_data(url): try: res = r.get(url) except Exception as e: return Response(data=None, error=e) if res.status_code == 200: return Response(data=res, error=None) else: return Response(data=None, error=res.reason)
I think the above function is ugly, just too many checks, and returns. It would be great to have just two returns: one for the failure, one for the success.
The Unified Interface
After digging through the requests code and documentation, it turned out that all the exceptions raise by the library are in the
requests.exceptions module, and inherit from the
RequestException class. This class has fields containing the request, and the response objects (if there was any response).
The response object also contains a function
raise_for_status() which raises an exception when the status code was different than 200.
Let’s use all this information, and the result code looks like this:
import requests as r from collections import namedtuple from requests.exceptions import RequestException Response = namedtuple('Response', ['data', 'error']) def download_data(url): """Returns Response object with data, or error information. """ try: res = r.get(url) res.raise_for_status() return Response(data=res.text, error=None) except RequestException as e: return Response(data=None, error=str(e))
And then when using the requests library for good and bad URLs:
for url in ['http://www.simononsoftware.com/nopage', 'http://xxx.simononsoftware.com', 'htttt://example.com', 'http://www.simononsoftware.com']: print(url) print(download_data(url))
http://www.simononsoftware.com/nopage Response(data=None, error='404 Client Error: Not Found for url: http://www.simononsoftware.com/nopage') http://xxx.simononsoftware.com Response(data=None, error="HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection. HTTPConnection object at 0x7f3215aecb38>: Failed to establish a new connection: [Errno -2] Name or service not known',))") htttt://example.com Response(data=None, error="No connection adapters were found for 'htttt://example.com'") http://www.simononsoftware.com Response(data='<HUGE HTML HERE'>, error=None")