python - Perfomance of map vs starmap? -

i trying make pure-python (without external dependencies) element-wise comparison of 2 sequences. first solution was:

list(map(operator.eq, seq1, seq2))

then found starmap function itertools, seemed pretty similar me. turned out 37% faster on computer in worst case. not obvious me, measured time necessary retrieve 1 element generator (don't know if way correct):

from operator import eq itertools import starmap  seq1 = [1,2,3]*10000 seq2 = [1,2,3]*10000 seq2[-1] = 5  gen1 = map(eq, seq1, seq2)) gen2 = starmap(eq, zip(seq1, seq2))  %timeit -n1000 -r10 next(gen1) %timeit -n1000 -r10 next(gen2)  271 ns ± 1.26 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each) 208 ns ± 1.72 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each)

in retrieving elements second solution 24% more perfomant. after both produce same results list. somewhere gain 13% in time:

%timeit list(map(eq, seq1, seq2)) %timeit list(starmap(eq, zip(seq1, seq2)))  5.24 ms ± 29.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.34 ms ± 84.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

i don't know how dig deeper in profiling of such nested code? question why first generator faster in retrieving , gain 13% in list function?

edit: first intention perform element-wise comparison instead of all, all function replaced list, replacement not affect timing ratio.

python 3.6.2 on windows 10 (64bit)

there several factors contribute (in conjunction) observed performance difference:

zip re-uses returned tuple if has reference count of 1 when next __next__ call made.
map builds new tuple passed "mapped function" every time __next__ call made. won't create new tuple scratch because python maintains storage unused tuples. in case map has find unused tuple of right size.
starmap checks if next item in iterable of type tuple , if passes on.
calling c function within c code pyobject_call won't create new tuple passed callee.

so starmap zip use 1 tuple on , on again passed operator.eq reducing function call overhead immensely. map on other hand create new tuple (or fill c array cpython 3.6 on) every time operator.eq called. speed difference tuple creation overhead.

instead of linking source code i'll provide cython code can used verify this:

in [1]: %load_ext cython  in [2]: %%cython    ...:    ...: cpython.ref cimport py_decref    ...:    ...: cpdef func(zipper):    ...:     = next(zipper)    ...:     print('a', a)    ...:     py_decref(a)    ...:     b = next(zipper)    ...:     print('a', a)  in [3]: func(zip([1, 2], [1, 2])) (1, 1) (2, 2)

yes, tuples aren't immutable, simple py_decref sufficient "trick" zip believing noone else holds reference returned tuple!

as "tuple-pass-thru":

in [4]: %%cython    ...:    ...: def func_inner(*args):    ...:     print(id(args))    ...:    ...: def func(*args):    ...:     print(id(args))    ...:     func_inner(*args)  in [5]: func(1, 2) 1404350461320 1404350461320

so tuple passed right through (just because these defined c functions!) doesn't happen pure python functions:

in [6]: def func_inner(*args):    ...:     print(id(args))    ...:    ...: def func(*args):    ...:     print(id(args))    ...:     func_inner(*args)    ...:  in [7]: func(1, 2) 1404350436488 1404352833800

note doesn't happen if called function isn't c function if called c function:

in [8]: %%cython    ...:     ...: def func_inner_c(*args):    ...:     print(id(args))    ...:     ...: def func(inner, *args):    ...:     print(id(args))    ...:     inner(*args)    ...:  in [9]: def func_inner_py(*args):     ...:     print(id(args))     ...:     ...:  in [10]: func(func_inner_py, 1, 2) 1404350471944 1404353010184  in [11]: func(func_inner_c, 1, 2) 1404344354824 1404344354824

so there lot of "coincidences" leading point starmap zip faster calling map multiple arguments when called function c function...

Search This Blog

Single

python - Perfomance of map vs starmap? -

Comments

Post a Comment