objective

i trying extract possible emoticons unicode word list. using python3 anaconda installation, therefore can not use package such emoji.py.

here sample bow of word list.

lst = ['✅','türkçe','Çile','ısp','İst','ğ','some','#','@','@one','#thing','','1','41','ç','ö','⏱','⏱','👏','₺','€',':)',':/']

expected output this:

out = ['✅','⏱', '⏱','👏']

attempt 1

list comprehension check if chars ascii:

[w w in lst if len(w) != len(w.encode())]

however, not giving desired output because there non ascii letters in text. also, currency symbols not emoticons.

['✅', 'türkçe', 'Çile', 'ısp', 'İst', 'ğ', 'ç', 'ö', '⏱', '⏱', '👏', '₺', '€']

using ntlk emoticons regular expression

from nltk.tokenize.casual import emoticon_re emoticon_re.findall(' '.join(lst))

however, emoticon_re can extract expressions such :) :/ :(

here list of considering emoticons.

i tried build list of emoticons see if word exists in list, not build list of emoticons unicode character codes.

can please suggest?

i think of characters in symbol, other category. therefore can do

[w w in lst if any(c c in w if unicodedata.category(c) == 'so')]