parsing - Ordering lexer rules in a grammar using ANTLR4 -

June 15, 2014

i'm using antlr4 generate parser. new parser grammars. i've read helpful antlr mega tutorial still stuck on how order (and/or write) lexer , parser rules.

i want parser able handle this:

hello << name >>, how you?

at runtime replace "<< name >>" user's name.

so parsing text words (and punctuation, symbols, etc), except occasional "<< >>" tag, calling "func" in lexer rules.

here grammar:

doc: item* eof ; item: (func | word) punct? ; func: '<<' id '>>' ;  ws : [ \t\n\r] -> skip ; fragment letter : [a-za-z] ; fragment digit : [0-9] ; fragment char : (letter | digit | symb ) ; word : char+ ; id: letter ( letter | digit)* ; punct : [.,?!] ; fragment symb : ~[a-za-z0-9.,?! |{}<>] ;

side note: added "punct?" @ end of "item" rule because possible, such in example sentence gave above, have comma appear right after "func". since can have comma after "word" decided put punctuation in "item" instead of in both of "func" , "word".

if run parser on above sentence, parse tree looks this:

anything highlighted in red parse error.

so not recognizing "id" inside double angle brackets "id". presumably because "word" comes first in list of lexer rules. however, have no rule says "<< word >>", rule says "<< id >>", i'm not clear on why happening.

if swap order of "id" , "word" in grammar, in order:

id: letter ( letter | digit)* ; word : char+ ;

and run parser, parse tree this:

so "func" , "id" rules being handled appropriately, none of "word"s being recognized.

how past conundrum?

i suppose 1 option might change "func" rule "<< word >>" , treat words, doing away "id". wanted differentiate text word variable identifier (for instance, no special characters allowed in variable identifier).

thanks help!

from the definitive antlr 4 reference :

antlr resolves lexical ambiguities matching input string rule specified first in grammar.

with grammar (in question.g4) , t.text file containing

hello << name >>, how @ 9 o'clock?

the execution of

$ grun question doc -tokens -diagnostics t.text

gives

[@0,0:4='hello',<word>,1:0] [@1,6:7='<<',<'<<'>,1:6] [@2,9:12='name',<word>,1:9] [@3,14:15='>>',<'>>'>,1:14] [@4,16:16=',',<punct>,1:16] [@5,18:20='how',<word>,1:18] [@6,22:24='are',<word>,1:22] [@7,26:28='you',<word>,1:26] [@8,30:31='at',<word>,1:30] [@9,33:36='nine',<word>,1:33] [@10,38:44='o'clock',<word>,1:38] [@11,45:45='?',<punct>,1:45] [@12,47:46='<eof>',<eof>,2:0] line 1:9 mismatched input 'name' expecting id line 1:14 extraneous input '>>' expecting {<eof>, '<<', word, punct}

now change word word in item rule, , add word rule :

item: (func | word) punct? ; word: word | id ;

and put id before word :

id: letter ( letter | digit)* ; word : char+ ;

the tokens now

[@0,0:4='hello',<id>,1:0] [@1,6:7='<<',<'<<'>,1:6] [@2,9:12='name',<id>,1:9] [@3,14:15='>>',<'>>'>,1:14] [@4,16:16=',',<punct>,1:16] [@5,18:20='how',<id>,1:18] [@6,22:24='are',<id>,1:22] [@7,26:28='you',<id>,1:26] [@8,30:31='at',<id>,1:30] [@9,33:36='nine',<id>,1:33] [@10,38:44='o'clock',<word>,1:38] [@11,45:45='?',<punct>,1:45] [@12,47:46='<eof>',<eof>,2:0]

and there no more error. -gui graphic shows, have branches identified word or func.

Search This Blog

Single

parsing - Ordering lexer rules in a grammar using ANTLR4 -

Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -