Efficient way of comparing multiple lists in python -
i have 5 long lists word pairs given in example below. note include word pair lists [['salad', 'fat']] , word pair list of lists [['bread', 'oil'], ['bread', ' salt']]
list_1 = [ [['salad', 'fat']], [['bread', 'oil'], ['bread', 'salt']], [['salt', 'sugar'] ] list_2 = [ [['salad', 'fat'], ['salt', 'sugar']], [['protein', 'soup']] ] list_3 = [ [['salad', ' protein']], [['bread', ' oil']], [['sugar', 'salt'] ] list_4 = [ [['salad', ' fat'], ['salad', 'chicken']] ] list_5 = [ ['sugar', 'protein'], ['sugar', 'bread'] ] now want calculate frequency of word pairs.
for example, in above 5 lists, should output follows, word pairs , frequency shown.
output_list = [{'['salad', 'fat']': 3}, {['bread', 'oil']: 2}, {['salt', 'sugar']: 2, {['sugar','salt']: 1} , on] what efficient way of doing in python?
given have uneven nested lists makes code ugly, fix input lists.
collections.counter() built kind of thing lists not hashable need turn them tuples (as strip off spurious spaces):
in []: import itertools collections import counter list_1 = [ [['salad', 'fat']], [['bread', 'oil'], ['bread', 'salt']], [['salt', 'sugar'] ]] list_2 = [ [['salad', 'fat'], ['salt', 'sugar']], [['protein', 'soup']] ] list_3 = [ [['salad', ' protein']], [['bread', ' oil']], [['sugar', 'salt'] ]] list_4 = [ [['salad', ' fat'], ['salad', 'chicken']] ] list_5 = [ ['sugar', 'protein'], ['sugar', 'bread']] t = lambda x: tuple(map(str.strip, x)) c = counter(map(t, it.chain.from_iterable(it.chain(list_1, list_2, list_3, list_4)))) c += counter(map(t, list_5)) c out[]: counter({('bread', 'oil'): 2, ('bread', 'salt'): 1, ('protein', 'soup'): 1, ('salad', 'chicken'): 1, ('salad', 'fat'): 3, ('salad', 'protein'): 1, ('salt', 'sugar'): 2, ('sugar', 'bread'): 1, ('sugar', 'protein'): 1, ('sugar', 'salt'): 1})
Comments
Post a Comment