python - Perfomance of map vs starmap? -
i trying make pure-python (without external dependencies) element-wise comparison of 2 sequences. first solution was:
list(map(operator.eq, seq1, seq2))
then found starmap
function itertools
, seemed pretty similar me. turned out 37% faster on computer in worst case. not obvious me, measured time necessary retrieve 1 element generator (don't know if way correct):
from operator import eq itertools import starmap seq1 = [1,2,3]*10000 seq2 = [1,2,3]*10000 seq2[-1] = 5 gen1 = map(eq, seq1, seq2)) gen2 = starmap(eq, zip(seq1, seq2)) %timeit -n1000 -r10 next(gen1) %timeit -n1000 -r10 next(gen2) 271 ns ± 1.26 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each) 208 ns ± 1.72 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each)
in retrieving elements second solution 24% more perfomant. after both produce same results list
. somewhere gain 13% in time:
%timeit list(map(eq, seq1, seq2)) %timeit list(starmap(eq, zip(seq1, seq2))) 5.24 ms ± 29.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.34 ms ± 84.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
i don't know how dig deeper in profiling of such nested code? question why first generator faster in retrieving , gain 13% in list
function?
edit: first intention perform element-wise comparison instead of all
, all
function replaced list
, replacement not affect timing ratio.
python 3.6.2 on windows 10 (64bit)
there several factors contribute (in conjunction) observed performance difference:
zip
re-uses returnedtuple
if has reference count of 1 when next__next__
call made.map
builds newtuple
passed "mapped function" every time__next__
call made. won't create new tuple scratch because python maintains storage unused tuples. in casemap
has find unused tuple of right size.starmap
checks if next item in iterable of typetuple
, if passes on.- calling c function within c code
pyobject_call
won't create new tuple passed callee.
so starmap
zip
use 1 tuple on , on again passed operator.eq
reducing function call overhead immensely. map
on other hand create new tuple (or fill c array cpython 3.6 on) every time operator.eq
called. speed difference tuple creation overhead.
instead of linking source code i'll provide cython code can used verify this:
in [1]: %load_ext cython in [2]: %%cython ...: ...: cpython.ref cimport py_decref ...: ...: cpdef func(zipper): ...: = next(zipper) ...: print('a', a) ...: py_decref(a) ...: b = next(zipper) ...: print('a', a) in [3]: func(zip([1, 2], [1, 2])) (1, 1) (2, 2)
yes, tuple
s aren't immutable, simple py_decref
sufficient "trick" zip
believing noone else holds reference returned tuple!
as "tuple-pass-thru":
in [4]: %%cython ...: ...: def func_inner(*args): ...: print(id(args)) ...: ...: def func(*args): ...: print(id(args)) ...: func_inner(*args) in [5]: func(1, 2) 1404350461320 1404350461320
so tuple passed right through (just because these defined c functions!) doesn't happen pure python functions:
in [6]: def func_inner(*args): ...: print(id(args)) ...: ...: def func(*args): ...: print(id(args)) ...: func_inner(*args) ...: in [7]: func(1, 2) 1404350436488 1404352833800
note doesn't happen if called function isn't c function if called c function:
in [8]: %%cython ...: ...: def func_inner_c(*args): ...: print(id(args)) ...: ...: def func(inner, *args): ...: print(id(args)) ...: inner(*args) ...: in [9]: def func_inner_py(*args): ...: print(id(args)) ...: ...: in [10]: func(func_inner_py, 1, 2) 1404350471944 1404353010184 in [11]: func(func_inner_c, 1, 2) 1404344354824 1404344354824
so there lot of "coincidences" leading point starmap
zip
faster calling map
multiple arguments when called function c function...
Comments
Post a Comment