A Simpler Requests Interface
The Requests is quite a good Python library. It simplifies using the ugly and terrible standard Python HTTP libraries. The more I think about the interfaces, the more I think that there is lots of places to improve including the Requests library.
The Basic Requests Usage
The main requests interface which I use is simple:
import requests as r
def download_data(url):
res = r.get(url)
if res.status_code() == 200:
# this is OK
else:
# there was an error
Can you spot an error there?
Getting The Requests Errors
The requests library also raises some exceptions, which are not caught there.
Missing The Server
When calling a URL for which there is no server responding:
res = r.get('http://xxx.simononsoftware.com')
you will get an exception:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: /nopage (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7faeecc75860>: Failed to establish a new connection: [Errno -2] Name or service not known',))
A Bad URL
When calling a URL with missing resource:
res = r.get('http://www.simononsoftware.com/nopage')
you will get no exception, however the status_code
will not be 200
:
print(res.status_code)
404
So it would be nice to have one unified way of getting the error.
Catching Errors
To catch all the errors we should check both: exceptions and status_code:
import requests as r
def download_data(url):
try:
res = r.get(url)
except:
# do something on exception
if res.status_code == 200:
# this is OK
else:
# there was an error
The Return Value
I also need to store in a database all the results of the HTTP fetches, especially the error information. So I need to check for the errors. Some of the errors are caused by the libraries that requests uses. Some are returned by the HTTP server. I need to get all of them. Let’s store them in an object of a class like this one:
from collections import namedtuple
Response = namedtuple('Response', ['data', 'error'])
The two fields: the data
and the error
are exclusive, only one of them can be filled, the other will be None
.
I have added the return value to the above function:
import requests as r
from collections import namedtuple
Response = namedtuple('Response', ['data', 'error'])
def download_data(url):
try:
res = r.get(url)
except Exception as e:
return Response(data=None, error=e)
if res.status_code == 200:
return Response(data=res, error=None)
else:
return Response(data=None, error=res.reason)
I think the above function is ugly. Too many checks and returns. It would be great to have just two returns: one for the failure, one for the success.
The Unified Interface
After digging through the requests code and documentation, it turned out that all the exceptions raised by the library are in the requests.exceptions
module and inherit from the RequestException
class. This class has fields containing the request and the response objects (if there was any response).
The response object also contains a function raise_for_status()
, which raises an exception when the status code is different than 200
.
Let’s use all this information to change the download_data
function:
import requests as r
from collections import namedtuple
from requests.exceptions import RequestException
Response = namedtuple('Response', ['data', 'error'])
def download_data(url):
"""Returns Response object with data, or error information.
"""
try:
res = r.get(url)
res.raise_for_status()
return Response(data=res.text, error=None)
except RequestException as e:
return Response(data=None, error=str(e))
This way when using the requests library for good and bad URLs:
for url in ['http://www.simononsoftware.com/nopage',
'http://xxx.simononsoftware.com',
'htttt://example.com',
'http://www.simononsoftware.com']:
print(url)
print(download_data(url))
I get:
http://www.simononsoftware.com/nopage
Response(data=None, error='404 Client Error: Not Found for url: http://www.simononsoftware.com/nopage')
http://xxx.simononsoftware.com
Response(data=None, error="HTTPConnectionPool(host='xxx.simononsoftware.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.
HTTPConnection object at 0x7f3215aecb38>: Failed to establish a new connection: [Errno -2] Name or service not known',))")
htttt://example.com
Response(data=None, error="No connection adapters were found for 'htttt://example.com'")
http://www.simononsoftware.com
Response(data='<HUGE HTML HERE'>, error=None")