python - Sequence words with regex -

February 15, 2010

i search sequence:

nunca[adv+neg+circ] más[adv+comp+circ] compraré[v+h_predicat_action]

and

nunca más compraré

my script:

corpus = "me[unknown] temo[unknown] que[unknown] buscare[unknown]  otras[unknown] opciones[unknown] esta[unknown] nunca[adv+neg+circ]  más[adv+comp+padv+h_circonstant_quantite] compraré[v+h_predicat_action]"  part1 = re.findall(r"(\w+)\[adv\+neg.*?\]", corpus) part2 = re.findall(r"(\w+)\[adv+comp+padv.*?\]", corpus) part3 = re.findall(r"(\w+)\[v\+h_predicat.*?\]", corpus) print(part1 + part2 + part3)

result:

[]

if searched substrings in arbitrary order - use following: re.findall() approach:

corpus = "me[unknown] temo[unknown] que[unknown] buscare[unknown] \ otras[unknown] opciones[unknown] esta[unknown] nunca[adv+neg+circ] \ más[adv+comp+padv+h_circonstant_quantite] compraré[v+h_predicat_action]"  result = ' '.join(i[0] in re.findall(r'(\w+)\[[^][]*(ad|v)\+[^][]*\]', corpus, re.m | re.unicode)) print(result)

the output:

nunca más compraré

regex pattern explanation:

(\w+) - match word(alphanumeric sequence) (for ex. nunca). placed first captured group (...)
\[ - match opening square bracket [ literally
[^][]* - match 1 or many characters except square brackets ][
(ad|v) - alternation group, match either ad or v key
\] - match closing square bracket ] literally

for ex. \[[^][]*(ad|v)\+[^][]*\] match [adv+neg+circ]

----------

if order of sequences strict - use re.sub() function instead re.findall() remove parenthetical sequences:

corpus = "me[unknown] temo[unknown] que[unknown] buscare[unknown] \ otras[unknown] opciones[unknown] esta[unknown] nunca[adv+neg+circ] \ más[adv+comp+padv+h_circonstant_quantite] compraré[v+h_predicat_action]"  result = re.sub(r'\[[^][]+\]', '', corpus, re.m | re.unicode) print(result)

the output:

me temo que buscare otras opciones esta nunca más compraré

to extract last 3 words:

print(' '.join(result.split()[-3:]))    # nunca más compraré

Search This Blog

Single

python - Sequence words with regex -

Comments

Post a Comment

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

linux - Why does bash short curcuit fail in crontab? -