The Requests is quite a good Python library. It simplifies using the ugly and terrible standard Python HTTP libraries. The more I think about the interfaces, the more I think that there is lots of places to improve including the Requests library.
The Basic Requests Usage
The main requests interface which I use is simple:
import requests as r def download_data(url): res = r.get(url) if res.status_code() == 200: # this is OK else: # there was an error
Can you spot an error there?
Getting The Requests Errors
The requests library also raises some exceptions, which are not caught there.
Missing The Server
When calling a URL for which there is no server responding:
res = r.get('http://xxx.simononsoftware.com')
you will get an exception:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: /nopage (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7faeecc75860>: Failed to establish a new connection: [Errno -2] Name or service not known',))
A Bad URL
When calling a URL with missing resource:
res = r.get('http://www.simononsoftware.com/nopage')
you will get no exception, however the
status_code will not be
So it would be nice to have one unified way of getting the error.
To catch all the errors we should check both: exceptions and status_code:
import requests as r def download_data(url): try: res = r.get(url) except: # do something on exception if res.status_code == 200: # this is OK else: # there was an error
The Return Value
I also need to store in a database all the results of the HTTP fetches, especially the error information. So I need to check for the errors. Some of the errors are caused by the libraries that requests uses. Some are returned by the HTTP server. I need to get all of them. Let’s store them in an object of a class like this one:
from collections import namedtuple Response = namedtuple('Response', ['data', 'error'])
The two fields: the
data and the
error are exclusive, only one of them can be filled, the other will be
I have added the return value to the above function:
import requests as r from collections import namedtuple Response = namedtuple('Response', ['data', 'error']) def download_data(url): try: res = r.get(url) except Exception as e: return Response(data=None, error=e) if res.status_code == 200: return Response(data=res, error=None) else: return Response(data=None, error=res.reason)
I think the above function is ugly. Too many checks and returns. It would be great to have just two returns: one for the failure, one for the success.
The Unified Interface
After digging through the requests code and documentation, it turned out that all the exceptions raised by the library are in the
requests.exceptions module and inherit from the
RequestException class. This class has fields containing the request and the response objects (if there was any response).
The response object also contains a function
raise_for_status(), which raises an exception when the status code is different than
Let’s use all this information to change the
import requests as r from collections import namedtuple from requests.exceptions import RequestException Response = namedtuple('Response', ['data', 'error']) def download_data(url): """Returns Response object with data, or error information. """ try: res = r.get(url) res.raise_for_status() return Response(data=res.text, error=None) except RequestException as e: return Response(data=None, error=str(e))
This way when using the requests library for good and bad URLs:
for url in ['http://www.simononsoftware.com/nopage', 'http://xxx.simononsoftware.com', 'htttt://example.com', 'http://www.simononsoftware.com']: print(url) print(download_data(url))
http://www.simononsoftware.com/nopage Response(data=None, error='404 Client Error: Not Found for url: http://www.simononsoftware.com/nopage') http://xxx.simononsoftware.com Response(data=None, error="HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection. HTTPConnection object at 0x7f3215aecb38>: Failed to establish a new connection: [Errno -2] Name or service not known',))") htttt://example.com Response(data=None, error="No connection adapters were found for 'htttt://example.com'") http://www.simononsoftware.com Response(data='<HUGE HTML HERE'>, error=None")