unicode - extract all possible emoticons from a python list -


objective

i trying extract possible emoticons unicode word list. using python3 anaconda installation, therefore can not use package such emoji.py.

here sample bow of word list.

lst = ['✅','türkçe','Çile','ısp','İst','ğ','some','#','@','@one','#thing','','1','41','ç','ö','⏱','⏱','👏','₺','€',':)',':/'] 

expected output this:

out = ['✅','⏱', '⏱','👏'] 

attempt 1

list comprehension check if chars ascii:

[w w in lst if len(w) != len(w.encode())] 

however, not giving desired output because there non ascii letters in text. also, currency symbols not emoticons.

['✅', 'türkçe', 'Çile', 'ısp', 'İst', 'ğ', 'ç', 'ö', '⏱', '⏱', '👏', '₺', '€'] 

attempt 2

using ntlk emoticons regular expression

from nltk.tokenize.casual import emoticon_re emoticon_re.findall(' '.join(lst)) 

however, emoticon_re can extract expressions such :) :/ :(

here list of considering emoticons.

i tried build list of emoticons see if word exists in list, not build list of emoticons unicode character codes.

can please suggest?

i think of characters in symbol, other category. therefore can do

[w w in lst if any(c c in w if unicodedata.category(c) == 'so')] 

Comments

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

minify - Minimizing css files -