python - Sequence words with regex -


i search sequence:

nunca[adv+neg+circ] más[adv+comp+circ] compraré[v+h_predicat_action]

and

nunca más compraré

my script:

corpus = "me[unknown] temo[unknown] que[unknown] buscare[unknown]  otras[unknown] opciones[unknown] esta[unknown] nunca[adv+neg+circ]  más[adv+comp+padv+h_circonstant_quantite] compraré[v+h_predicat_action]"  part1 = re.findall(r"(\w+)\[adv\+neg.*?\]", corpus) part2 = re.findall(r"(\w+)\[adv+comp+padv.*?\]", corpus) part3 = re.findall(r"(\w+)\[v\+h_predicat.*?\]", corpus) print(part1 + part2 + part3) 

result:

[]

if searched substrings in arbitrary order - use following: re.findall() approach:

corpus = "me[unknown] temo[unknown] que[unknown] buscare[unknown] \ otras[unknown] opciones[unknown] esta[unknown] nunca[adv+neg+circ] \ más[adv+comp+padv+h_circonstant_quantite] compraré[v+h_predicat_action]"  result = ' '.join(i[0] in re.findall(r'(\w+)\[[^][]*(ad|v)\+[^][]*\]', corpus, re.m | re.unicode)) print(result) 

the output:

nunca más compraré 

regex pattern explanation:

  • (\w+) - match word(alphanumeric sequence) (for ex. nunca). placed first captured group (...)

  • \[ - match opening square bracket [ literally

  • [^][]* - match 1 or many characters except square brackets ][

  • (ad|v) - alternation group, match either ad or v key

  • \] - match closing square bracket ] literally

for ex. \[[^][]*(ad|v)\+[^][]*\] match [adv+neg+circ]

----------

if order of sequences strict - use re.sub() function instead re.findall() remove parenthetical sequences:

corpus = "me[unknown] temo[unknown] que[unknown] buscare[unknown] \ otras[unknown] opciones[unknown] esta[unknown] nunca[adv+neg+circ] \ más[adv+comp+padv+h_circonstant_quantite] compraré[v+h_predicat_action]"  result = re.sub(r'\[[^][]+\]', '', corpus, re.m | re.unicode) print(result) 

the output:

me temo que buscare otras opciones esta nunca más compraré 

to extract last 3 words:

print(' '.join(result.split()[-3:]))    # nunca más compraré 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

Add a dynamic header in angular 2 http provider -

minify - Minimizing css files -