Which One Use: List Comprehension Or Map?

Author: Szymon Lipiński
Published at: 2016-10-07

Some time ago there was a discussion on IRC which looked like to this one:

- You should not use map() in Python, but only the list comprehension.
- Why?
- Because it is more Pythonic.
- Why?
- Because all people use the list comprehension.

Other arguments were not convincing as well. I have written about the map and list comprehension at Advanced Python Constructs

Arguments like "because I say it" or "all people do it like this" (evidence please) are just stupid, not convincing, and simply false (I use map so that’s not true that all people don't use it). So I have decided to make some benchmarks for the memory/time characteristics for list comprehensions, and maps.

The Test Logic

The logic is simple: sum all squares of numbers from (1..max). So basically:

data = list(range(1, MAX_NUMBER+1))

@profile
def sum_numbers(data):
    res = 0
    # different algorithms go here
    return res

A couple of remarks:

I have checked different values for the MAX_NUMBER. The time/memory difference between the algorithm versions is proportional to the MAX_NUMBER. So it’s quite OK that I show the results for only one value.

The first line can look a little strange. My intention was to make a list of integers before calling a test function for both: Python 2, and Python 3. For the Python 2 it would be enough to use data = range(1, MAX_NUMBER+1). However for Python 3 the range function returns a generator, and there is no xrange one. So the simplest way to get a list, not a generator, in Python 3 is to use list() and pass the generator as the argument.

Each sum_numbers function is run by running a separate Python process.

The structure of all the functions stays the same, only the middle line is replaced with a proper algorithm.

Of course there is also some additional logic. I have also run that with memory_profiler. That’s why there is @profile before the function.

Basically we can sum the numbers using the for loop or the sum function. And this can be done using map function or a list comprehension. So this gives us four cases, plus the simplest one with just a for loop.

The test is run using a function:

@profile
def run_with_max(test):
    with Timer(LOOP_COUNT, test):
        res = test(data)

    assert res == 333333833333500000

The Timer is a class for calculating the elapsed time. The assert line ensures that the function works properly, and what’s more the memory profiler will show if there was any change in the memory usage after finishing the test.

The MAX_NUMBER value is 1.000.000.

The size of the data variable is 8.6MB.

For Python 2 I used: Using: 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609].

For Python 3 I used: Using: 3.5.2 (default, Sep 10 2016, 08:21:44) [GCC 5.4.0 20160609].

Simple For Loop

This is the simplest, and the most common, way of iterating a container. Just make a for loop, and iterate.

@profile
def sum_numbers_list_for(data):
    res = 0
    for x in data:
      res += x*x
    return res

Results for Python2

sum_numbers_list_for - 0:01:08.409486 - 68.41s

Line #    Mem usage    Increment   Line Contents
================================================
    10   44.559 MiB    0.000 MiB   @profile
    11                             def sum_numbers_list_for(data):
    12   44.559 MiB    0.000 MiB       res = 0
    13   44.559 MiB    0.000 MiB       for x in data:
    14   44.559 MiB    0.000 MiB           res += x*x
    15   44.559 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   44.559 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   44.559 MiB    0.000 MiB       res = 0
   134   44.559 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   44.559 MiB    0.000 MiB           res = test(data)
   136   44.559 MiB    0.000 MiB       assert res == 333333833333500000



Results for Python3:

sum_numbers_list_for - 0:01:16.079287 - 76.08s

Line #    Mem usage    Increment   Line Contents
================================================
    10   52.809 MiB    0.000 MiB   @profile
    11                             def sum_numbers_list_for(data):
    12   52.809 MiB    0.000 MiB       res = 0
    13   52.809 MiB    0.000 MiB       for x in data:
    14   52.809 MiB    0.000 MiB           res += x*x
    15   52.809 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   52.809 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   52.809 MiB    0.000 MiB       res = 0
   134   52.809 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   53.320 MiB    0.512 MiB           res = test(data)
   136   53.320 MiB    0.000 MiB       assert res == 333333833333500000



List Comprehension With For Loop Squaring

Here we have a list comprehension, which creates a list of original values. The values are then squared, and summed.

@profile
def sum_numbers_list_comprehension_for_square(data):
    res = 0
    for x in [n for n in data]:
        res += x*x
    return res

Results for Python2

sum_numbers_list_comprehension_for_square - 0:01:45.653211 - 105.65s

Line #    Mem usage    Increment   Line Contents
================================================
    18   44.680 MiB    0.000 MiB   @profile
    19                             def sum_numbers_list_comprehension_for_square(data):
    20   44.680 MiB    0.000 MiB       res = 0
    21   54.457 MiB    9.777 MiB       for x in [n for n in data]:
    22   54.457 MiB    0.000 MiB           res += x*x
    23   54.457 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   44.680 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   44.680 MiB    0.000 MiB       res = 0
   134   44.680 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   54.457 MiB    9.777 MiB           res = test(data)
   136   54.457 MiB    0.000 MiB       assert res == 333333833333500000


Results for Python3:

sum_numbers_list_comprehension_for_square - 0:01:57.452431 - 117.45s

Line #    Mem usage    Increment   Line Contents
================================================
    18   52.750 MiB    0.000 MiB   @profile
    19                             def sum_numbers_list_comprehension_for_square(data):
    20   52.750 MiB    0.000 MiB       res = 0
    21   60.562 MiB    7.812 MiB       for x in [n for n in data]:
    22   60.562 MiB    0.000 MiB           res += x*x
    23   53.121 MiB   -7.441 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   52.750 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   52.750 MiB    0.000 MiB       res = 0
   134   52.750 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   53.121 MiB    0.371 MiB           res = test(data)
   136   53.121 MiB    0.000 MiB       assert res == 333333833333500000



List Comprehension With For Loop Squaring

Here we have a list comprehension, which creates a list of squared values, and then it is summed using the for loop.

@profile
def sum_numbers_list_comprehension_for_square(data):
    res = 0
    for x in [n*n for n in data]:
        res += x
    return res

Results for Python2

sum_numbers_list_comprehension_for_square - 0:01:45.331985 - 105.33s

Line #    Mem usage    Increment   Line Contents
================================================
    18   44.836 MiB    0.000 MiB   @profile
    19                             def sum_numbers_list_comprehension_for_square(data):
    20   44.836 MiB    0.000 MiB       res = 0
    21   54.098 MiB    9.262 MiB       for x in [n for n in data]:
    22   54.098 MiB    0.000 MiB           res += x*x
    23   54.098 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   44.836 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   44.836 MiB    0.000 MiB       res = 0
   134   44.836 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   54.098 MiB    9.262 MiB           res = test(data)
   136   54.098 MiB    0.000 MiB       assert res == 333333833333500000


Results for Python3:

um_numbers_list_comprehension_for_square - 0:01:57.225886 - 117.23s

Line #    Mem usage    Increment   Line Contents
================================================
    18   52.688 MiB    0.000 MiB   @profile
    19                             def sum_numbers_list_comprehension_for_square(data):
    20   52.688 MiB    0.000 MiB       res = 0
    21   60.500 MiB    7.812 MiB       for x in [n for n in data]:
    22   60.500 MiB    0.000 MiB           res += x*x
    23   53.059 MiB   -7.441 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   52.688 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   52.688 MiB    0.000 MiB       res = 0
   134   52.688 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   53.059 MiB    0.371 MiB           res = test(data)
   136   53.059 MiB    0.000 MiB       assert res == 333333833333500000



List Comprehension With Sum()

There is a list comprehension, which creates a list of squared values, and then it is summed using the sum function.

@profile
def sum_numbers_list_comprehension_squared_sum(data):
    res = 0
    res = sum([n*n for n in data])
    return res

Results for Python2

sum_numbers_list_comprehension_squared_sum - 0:00:34.678465 - 34.68s

Line #    Mem usage    Increment   Line Contents
================================================
    34   45.551 MiB    0.000 MiB   @profile
    35                             def sum_numbers_list_comprehension_squared_sum(data):
    36   45.551 MiB    0.000 MiB       res = 0
    37   87.156 MiB   41.605 MiB       res = sum([n*n for n in data])
    38   79.777 MiB   -7.379 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   45.551 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   45.551 MiB    0.000 MiB       res = 0
   134   45.551 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   79.777 MiB   34.227 MiB           res = test(data)
   136   79.777 MiB    0.000 MiB       assert res == 333333833333500000


Results for Python3:

sum_numbers_list_comprehension_squared_sum - 0:00:37.741196 - 37.74s

Line #    Mem usage    Increment   Line Contents
================================================
    34   52.711 MiB    0.000 MiB   @profile
    35                             def sum_numbers_list_comprehension_squared_sum(data):
    36   52.711 MiB    0.000 MiB       res = 0
    37   91.566 MiB   38.855 MiB       res = sum([n*n for n in data])
    38   53.332 MiB  -38.234 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   52.711 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   52.711 MiB    0.000 MiB       res = 0
   134   52.711 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   53.332 MiB    0.621 MiB           res = test(data)
   136   53.332 MiB    0.000 MiB       assert res == 333333833333500000


Map With For Loop Squaring

Here we have the map() function, which returns in fact an original values. They are then squared, and summed in a for loop.

@profile
def sum_numbers_map_for_square(data):
    res = 0
    for x in map(lambda n: n, data):
        res += x*x
    return res

Results for Python2

sum_numbers_map_for_square - 0:02:19.674826 - 139.67s

Line #    Mem usage    Increment   Line Contents
================================================
    41   44.602 MiB    0.000 MiB   @profile
    42                             def sum_numbers_map_for_square(data):
    43   44.602 MiB    0.000 MiB       res = 0
    44   52.250 MiB    7.648 MiB       for x in map(lambda n: n, data):
    45   52.250 MiB    0.000 MiB           res += x*x
    46   52.250 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   44.602 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   44.602 MiB    0.000 MiB       res = 0
   134   44.602 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   52.250 MiB    7.648 MiB           res = test(data)
   136   52.250 MiB    0.000 MiB       assert res == 333333833333500000


Results for Python3:

sum_numbers_map_for_square - 0:02:30.948538 - 150.95s

Line #    Mem usage    Increment   Line Contents
================================================
    41   53.016 MiB    0.000 MiB   @profile
    42                             def sum_numbers_map_for_square(data):
    43   53.016 MiB    0.000 MiB       res = 0
    44   53.016 MiB    0.000 MiB       for x in map(lambda n: n, data):
    45   53.016 MiB    0.000 MiB           res += x*x
    46   53.016 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   53.016 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   53.016 MiB    0.000 MiB       res = 0
   134   53.016 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   53.270 MiB    0.254 MiB           res = test(data)
   136   53.270 MiB    0.000 MiB       assert res == 333333833333500000



Map Squared With For Loop

Here the map returns squared values, which are summed using the for loop.

@profile
def sum_numbers_map_squared_for(data):
    res = 0
    for x in map(lambda n: n*n, data):
        res += x
    return res

Results for Python2

sum_numbers_map_squared_for - 0:02:21.770322 - 141.77s

Line #    Mem usage    Increment   Line Contents
================================================
    49   44.570 MiB    0.000 MiB   @profile
    50                             def sum_numbers_map_squared_for(data):
    51   44.570 MiB    0.000 MiB       res = 0
    52   75.680 MiB   31.109 MiB       for x in map(lambda n: n*n, data):
    53   75.680 MiB    0.000 MiB           res += x
    54   75.680 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   44.570 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   44.570 MiB    0.000 MiB       res = 0
   134   44.570 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   75.680 MiB   31.109 MiB           res = test(data)
   136   75.680 MiB    0.000 MiB       assert res == 333333833333500000

Results for Python3:

sum_numbers_map_squared_for - 0:02:39.226192 - 159.23s

Line #    Mem usage    Increment   Line Contents
================================================
    49   53.508 MiB    0.000 MiB   @profile
    50                             def sum_numbers_map_squared_for(data):
    51   53.508 MiB    0.000 MiB       res = 0
    52   53.508 MiB    0.000 MiB       for x in map(lambda n: n*n, data):
    53   53.508 MiB    0.000 MiB           res += x
    54   53.508 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   53.508 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   53.508 MiB    0.000 MiB       res = 0
   134   53.508 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   53.762 MiB    0.254 MiB           res = test(data)
   136   53.762 MiB    0.000 MiB       assert res == 333333833333500000


Map Squared With Sum

And the most common functional way of implementing this: map returning squared values, which are summed using the sum function.

@profile
def sum_numbers_map_squared_sum(data):
    res = 0
    res = sum(map(lambda n: n*n, data))
    return res

Results for Python2

sum_numbers_map_squared_sum - 0:01:11.721421 - 71.72s

Line #    Mem usage    Increment   Line Contents
================================================
    57   44.527 MiB    0.000 MiB   @profile
    58                             def sum_numbers_map_squared_sum(data):
    59   44.527 MiB    0.000 MiB       res = 0
    60   75.637 MiB   31.109 MiB       res = sum(map(lambda n: n*n, data))
    61   75.637 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   44.527 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   44.527 MiB    0.000 MiB       res = 0
   134   44.527 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   75.637 MiB   31.109 MiB           res = test(data)
   136   75.637 MiB    0.000 MiB       assert res == 333333833333500000


Results for Python3:

sum_numbers_map_squared_sum - 0:01:17.069232 - 77.07s

Line #    Mem usage    Increment   Line Contents
================================================
    57   53.781 MiB    0.000 MiB   @profile
    58                             def sum_numbers_map_squared_sum(data):
    59   53.781 MiB    0.000 MiB       res = 0
    60   53.781 MiB    0.000 MiB       res = sum(map(lambda n: n*n, data))
    61   53.781 MiB    0.000 MiB       return res



Line #    Mem usage    Increment   Line Contents
================================================
   131   53.781 MiB    0.000 MiB   @profile
   132                             def run_test(test, LOOP_COUNT, data):
   133   53.781 MiB    0.000 MiB       res = 0
   134   53.781 MiB    0.000 MiB       with Timer(LOOP_COUNT, test):
   135   54.035 MiB    0.254 MiB           res = test(data)
   136   54.035 MiB    0.000 MiB       assert res == 333333833333500000



SUMMARY

Lot’s of data. Let’s sum it up.

Test Name Python Version Time [s] Memory Jump [MB]
simple for loop 2 68.41 0.0
simple for loop 3 76.08 0.0
compr. for sq. 2 105.65 9.7
compr. for sq. 3 117.45 7.8
compr. sq. for 2 105.33 9.2
compr. sq. for 3 117.23 7.8
compr. sum 2 34.68 41.6
compr. sum 3 37.74 38.8
map for. sq. 2 139.67 7.6
map for. sq. 3 150.95 0.0
map sq. for 2 141.77 31.1
map sq. for 3 159.23 0.0
map sq. sum 2 71.23 31.1
map sq. sum 3 77.07 0.0

I’m interested only in the memory jump inside the test function. I ignore if the memory decreased later.

A couple of remarks:

The simple for loop is not the best.

The list comprehension has huge memory overhead, as it builds a new list, so generally it should be slower, and has bigger memory usage.

I didn’t expect so huge memory jump in the comprehension+sum algorithm. However it was much faster than the for loop.

Map is generally the slowest. However map+sum has the best memory-time combination.

Python 3 is slower than Python 2.

The comments are disabled. If you want to write something to me, you can use e.g. Twitter.