Which One Use: List Comprehension Or Map?
Some time ago there was a discussion on IRC, which looked like this:
- You should not use the map() function
in Python but the only list comprehension.
- Why?
- Because it is more Pythonic.
- Why?
- Because all people use the list comprehension.
Other arguments were not convincing as well. I have written about the map
and the list comprehension
in Advanced Python Constructs.
Arguments like “because I say it” or “all people do it like this” (evidence please) are just stupid, not convincing, and simply false (I use map
so that’s not true that all people don't use it
). So I have decided to make some benchmarks for the memory/time characteristics for list comprehensions and maps.
The Test Logic
The logic is simple: sum all squares of numbers from (1..max). So basically:
data = list(range(1, MAX_NUMBER+1))
@profile
def sum_numbers(data):
res = 0
# different algorithms go here
return res
A couple of remarks:
I have checked different values for the MAX_NUMBER
. The time/memory difference between the algorithm versions is proportional to the MAX_NUMBER
. So it’s fine that I will show the results for only one value.
The first line can look a little bit strange. My intention was to make a list of integers before calling a test function for both: Python 2 and Python 3. For the Python 2 it would be enough to use data = range(1, MAX_NUMBER+1)
. However for Python 3 the range
function returns a generator and there is no xrange
one. So the simplest way to get a list, not a generator, in Python 3 is to use list()
and pass the generator as an argument.
Each sum_numbers
function is run by running a separate Python process.
The structure of all the functions stays the same, only the middle line is replaced with a proper algorithm.
Of course there is also some additional logic. I have also run that with the memory_profiler
. That’s why there is the @profile
decorator before the function.
We can sum the numbers using the for
loop or the sum
function. This can be done using the map
function or a list comprehension
. So this gives us four cases. Plus the simplest one with just a for loop.
The test is run using this function:
@profile
def run_with_max(test):
with Timer(LOOP_COUNT, test):
res = test(data)
assert res == 333333833333500000
The Timer
is a class for calculating the elapsed time. The assert line ensures that the function works properly. What’s more the memory profiler will show if there was any change in the memory usage after finishing the test.
The MAX_NUMBER
value is 1.000.000
.
The size of the data
variable is 8.6MB
.
For Python 2 I used: 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609]
.
For Python 3 I used: 3.5.2 (default, Sep 10 2016, 08:21:44) [GCC 5.4.0 20160609]
.
A Simple For Loop
This is the simplest and the most common way of iterating a container. Just make a for loop and iterate.
@profile
def sum_numbers_list_for(data):
res = 0
for x in data:
res += x*x
return res
Results for Python2
sum_numbers_list_for - 0:01:08.409486 - 68.41s
Line # Mem usage Increment Line Contents
================================================
10 44.559 MiB 0.000 MiB @profile
11 def sum_numbers_list_for(data):
12 44.559 MiB 0.000 MiB res = 0
13 44.559 MiB 0.000 MiB for x in data:
14 44.559 MiB 0.000 MiB res += x*x
15 44.559 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 44.559 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 44.559 MiB 0.000 MiB res = 0
134 44.559 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 44.559 MiB 0.000 MiB res = test(data)
136 44.559 MiB 0.000 MiB assert res == 333333833333500000
Results for Python3:
sum_numbers_list_for - 0:01:16.079287 - 76.08s
Line # Mem usage Increment Line Contents
================================================
10 52.809 MiB 0.000 MiB @profile
11 def sum_numbers_list_for(data):
12 52.809 MiB 0.000 MiB res = 0
13 52.809 MiB 0.000 MiB for x in data:
14 52.809 MiB 0.000 MiB res += x*x
15 52.809 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 52.809 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 52.809 MiB 0.000 MiB res = 0
134 52.809 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 53.320 MiB 0.512 MiB res = test(data)
136 53.320 MiB 0.000 MiB assert res == 333333833333500000
A List Comprehension With For Loop Squaring
Here we have a list comprehension, which creates a list of original values. The values are then squared, and summed.
@profile
def sum_numbers_list_comprehension_for_square(data):
res = 0
for x in [n for n in data]:
res += x*x
return res
Results for Python2
sum_numbers_list_comprehension_for_square - 0:01:45.653211 - 105.65s
Line # Mem usage Increment Line Contents
================================================
18 44.680 MiB 0.000 MiB @profile
19 def sum_numbers_list_comprehension_for_square(data):
20 44.680 MiB 0.000 MiB res = 0
21 54.457 MiB 9.777 MiB for x in [n for n in data]:
22 54.457 MiB 0.000 MiB res += x*x
23 54.457 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 44.680 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 44.680 MiB 0.000 MiB res = 0
134 44.680 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 54.457 MiB 9.777 MiB res = test(data)
136 54.457 MiB 0.000 MiB assert res == 333333833333500000
Results for Python3:
sum_numbers_list_comprehension_for_square - 0:01:57.452431 - 117.45s
Line # Mem usage Increment Line Contents
================================================
18 52.750 MiB 0.000 MiB @profile
19 def sum_numbers_list_comprehension_for_square(data):
20 52.750 MiB 0.000 MiB res = 0
21 60.562 MiB 7.812 MiB for x in [n for n in data]:
22 60.562 MiB 0.000 MiB res += x*x
23 53.121 MiB -7.441 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 52.750 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 52.750 MiB 0.000 MiB res = 0
134 52.750 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 53.121 MiB 0.371 MiB res = test(data)
136 53.121 MiB 0.000 MiB assert res == 333333833333500000
A List Comprehension With For Loop Squaring
Here we have a list comprehension, which creates a list of squared values. Then it is summed using the for loop.
@profile
def sum_numbers_list_comprehension_for_square(data):
res = 0
for x in [n*n for n in data]:
res += x
return res
Results for Python2
sum_numbers_list_comprehension_for_square - 0:01:45.331985 - 105.33s
Line # Mem usage Increment Line Contents
================================================
18 44.836 MiB 0.000 MiB @profile
19 def sum_numbers_list_comprehension_for_square(data):
20 44.836 MiB 0.000 MiB res = 0
21 54.098 MiB 9.262 MiB for x in [n for n in data]:
22 54.098 MiB 0.000 MiB res += x*x
23 54.098 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 44.836 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 44.836 MiB 0.000 MiB res = 0
134 44.836 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 54.098 MiB 9.262 MiB res = test(data)
136 54.098 MiB 0.000 MiB assert res == 333333833333500000
Results for Python3:
um_numbers_list_comprehension_for_square - 0:01:57.225886 - 117.23s
Line # Mem usage Increment Line Contents
================================================
18 52.688 MiB 0.000 MiB @profile
19 def sum_numbers_list_comprehension_for_square(data):
20 52.688 MiB 0.000 MiB res = 0
21 60.500 MiB 7.812 MiB for x in [n for n in data]:
22 60.500 MiB 0.000 MiB res += x*x
23 53.059 MiB -7.441 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 52.688 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 52.688 MiB 0.000 MiB res = 0
134 52.688 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 53.059 MiB 0.371 MiB res = test(data)
136 53.059 MiB 0.000 MiB assert res == 333333833333500000
A List Comprehension With Sum()
There is a list comprehension, which creates a list of squared values. Then it is summed using the sum
function.
@profile
def sum_numbers_list_comprehension_squared_sum(data):
res = 0
res = sum([n*n for n in data])
return res
Results for Python2
sum_numbers_list_comprehension_squared_sum - 0:00:34.678465 - 34.68s
Line # Mem usage Increment Line Contents
================================================
34 45.551 MiB 0.000 MiB @profile
35 def sum_numbers_list_comprehension_squared_sum(data):
36 45.551 MiB 0.000 MiB res = 0
37 87.156 MiB 41.605 MiB res = sum([n*n for n in data])
38 79.777 MiB -7.379 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 45.551 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 45.551 MiB 0.000 MiB res = 0
134 45.551 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 79.777 MiB 34.227 MiB res = test(data)
136 79.777 MiB 0.000 MiB assert res == 333333833333500000
Results for Python3:
sum_numbers_list_comprehension_squared_sum - 0:00:37.741196 - 37.74s
Line # Mem usage Increment Line Contents
================================================
34 52.711 MiB 0.000 MiB @profile
35 def sum_numbers_list_comprehension_squared_sum(data):
36 52.711 MiB 0.000 MiB res = 0
37 91.566 MiB 38.855 MiB res = sum([n*n for n in data])
38 53.332 MiB -38.234 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 52.711 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 52.711 MiB 0.000 MiB res = 0
134 52.711 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 53.332 MiB 0.621 MiB res = test(data)
136 53.332 MiB 0.000 MiB assert res == 333333833333500000
A Map With For Loop Squaring
Here we have the map()
function, which returns in fact an original values. They are then squared and summed in a for loop.
@profile
def sum_numbers_map_for_square(data):
res = 0
for x in map(lambda n: n, data):
res += x*x
return res
Results for Python2
sum_numbers_map_for_square - 0:02:19.674826 - 139.67s
Line # Mem usage Increment Line Contents
================================================
41 44.602 MiB 0.000 MiB @profile
42 def sum_numbers_map_for_square(data):
43 44.602 MiB 0.000 MiB res = 0
44 52.250 MiB 7.648 MiB for x in map(lambda n: n, data):
45 52.250 MiB 0.000 MiB res += x*x
46 52.250 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 44.602 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 44.602 MiB 0.000 MiB res = 0
134 44.602 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 52.250 MiB 7.648 MiB res = test(data)
136 52.250 MiB 0.000 MiB assert res == 333333833333500000
Results for Python3:
sum_numbers_map_for_square - 0:02:30.948538 - 150.95s
Line # Mem usage Increment Line Contents
================================================
41 53.016 MiB 0.000 MiB @profile
42 def sum_numbers_map_for_square(data):
43 53.016 MiB 0.000 MiB res = 0
44 53.016 MiB 0.000 MiB for x in map(lambda n: n, data):
45 53.016 MiB 0.000 MiB res += x*x
46 53.016 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 53.016 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 53.016 MiB 0.000 MiB res = 0
134 53.016 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 53.270 MiB 0.254 MiB res = test(data)
136 53.270 MiB 0.000 MiB assert res == 333333833333500000
A Map Squared With For Loop
Here the map
returns squared values, which are summed using the for loop.
@profile
def sum_numbers_map_squared_for(data):
res = 0
for x in map(lambda n: n*n, data):
res += x
return res
Results for Python2
sum_numbers_map_squared_for - 0:02:21.770322 - 141.77s
Line # Mem usage Increment Line Contents
================================================
49 44.570 MiB 0.000 MiB @profile
50 def sum_numbers_map_squared_for(data):
51 44.570 MiB 0.000 MiB res = 0
52 75.680 MiB 31.109 MiB for x in map(lambda n: n*n, data):
53 75.680 MiB 0.000 MiB res += x
54 75.680 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 44.570 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 44.570 MiB 0.000 MiB res = 0
134 44.570 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 75.680 MiB 31.109 MiB res = test(data)
136 75.680 MiB 0.000 MiB assert res == 333333833333500000
Results for Python3:
sum_numbers_map_squared_for - 0:02:39.226192 - 159.23s
Line # Mem usage Increment Line Contents
================================================
49 53.508 MiB 0.000 MiB @profile
50 def sum_numbers_map_squared_for(data):
51 53.508 MiB 0.000 MiB res = 0
52 53.508 MiB 0.000 MiB for x in map(lambda n: n*n, data):
53 53.508 MiB 0.000 MiB res += x
54 53.508 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 53.508 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 53.508 MiB 0.000 MiB res = 0
134 53.508 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 53.762 MiB 0.254 MiB res = test(data)
136 53.762 MiB 0.000 MiB assert res == 333333833333500000
A Map Squared With Sum
The most common functional way of implementing this: using the map
returning squared values, which are summed using the sum
function.
@profile
def sum_numbers_map_squared_sum(data):
res = 0
res = sum(map(lambda n: n*n, data))
return res
Results for Python2
sum_numbers_map_squared_sum - 0:01:11.721421 - 71.72s
Line # Mem usage Increment Line Contents
================================================
57 44.527 MiB 0.000 MiB @profile
58 def sum_numbers_map_squared_sum(data):
59 44.527 MiB 0.000 MiB res = 0
60 75.637 MiB 31.109 MiB res = sum(map(lambda n: n*n, data))
61 75.637 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 44.527 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 44.527 MiB 0.000 MiB res = 0
134 44.527 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 75.637 MiB 31.109 MiB res = test(data)
136 75.637 MiB 0.000 MiB assert res == 333333833333500000
Results for Python3:
sum_numbers_map_squared_sum - 0:01:17.069232 - 77.07s
Line # Mem usage Increment Line Contents
================================================
57 53.781 MiB 0.000 MiB @profile
58 def sum_numbers_map_squared_sum(data):
59 53.781 MiB 0.000 MiB res = 0
60 53.781 MiB 0.000 MiB res = sum(map(lambda n: n*n, data))
61 53.781 MiB 0.000 MiB return res
Line # Mem usage Increment Line Contents
================================================
131 53.781 MiB 0.000 MiB @profile
132 def run_test(test, LOOP_COUNT, data):
133 53.781 MiB 0.000 MiB res = 0
134 53.781 MiB 0.000 MiB with Timer(LOOP_COUNT, test):
135 54.035 MiB 0.254 MiB res = test(data)
136 54.035 MiB 0.000 MiB assert res == 333333833333500000
The Summary
Lot’s of data. Let’s sum it up.
Test Name | Python Version | Time [s] | Memory Jump [MB] |
---|---|---|---|
simple for loop | 2 | 68.41 | 0.0 |
simple for loop | 3 | 76.08 | 0.0 |
compr. for sq. | 2 | 105.65 | 9.7 |
compr. for sq. | 3 | 117.45 | 7.8 |
compr. sq. for | 2 | 105.33 | 9.2 |
compr. sq. for | 3 | 117.23 | 7.8 |
compr. sum | 2 | 34.68 | 41.6 |
compr. sum | 3 | 37.74 | 38.8 |
map for. sq. | 2 | 139.67 | 7.6 |
map for. sq. | 3 | 150.95 | 0.0 |
map sq. for | 2 | 141.77 | 31.1 |
map sq. for | 3 | 159.23 | 0.0 |
map sq. sum | 2 | 71.23 | 31.1 |
map sq. sum | 3 | 77.07 | 0.0 |
I’m interested only in the memory jump inside the test function. I ignore if the memory decreased later.
A couple of remarks:
- The simple for loop is not the best.
- The list comprehension has huge memory overhead, as it builds a new list. So it should be slower and it has bigger memory usage.
- I didn’t expect so huge memory jump in the comprehension+sum algorithm. However it was much faster than the for loop.
- Map is generally the slowest. However
map+sum
has the best memory-time combination. - Python 3 is slower than Python 2.