regex - Java: splitting a comma-separated string but ignoring commas in quotes -


i have string vaguely this:

foo,bar,c;qual="baz,blurb",d;junk="quux,syzygy" 

that want split commas -- need ignore commas in quotes. how can this? seems regexp approach fails; suppose can manually scan , enter different mode when see quote, nice use preexisting libraries. (edit: guess meant libraries part of jdk or part of commonly-used libraries apache commons.)

the above string should split into:

foo bar c;qual="baz,blurb" d;junk="quux,syzygy" 

note: not csv file, it's single string contained in file larger overall structure

try:

public class main {      public static void main(string[] args) {         string line = "foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\"";         string[] tokens = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);         for(string t : tokens) {             system.out.println("> "+t);         }     } } 

output:

> foo > bar > c;qual="baz,blurb" > d;junk="quux,syzygy" 

in other words: split on comma if comma has zero, or number of quotes ahead of it.

or, bit friendlier eyes:

public class main {      public static void main(string[] args) {         string line = "foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\"";          string otherthanquote = " [^\"] ";         string quotedstring = string.format(" \" %s* \" ", otherthanquote);         string regex = string.format("(?x) "+ // enable comments, ignore white spaces                 ",                         "+ // match comma                 "(?=                       "+ // start positive ahead                 "  (?:                     "+ //   start non-capturing group 1                 "    %s*                   "+ //     match 'otherthanquote' 0 or more times                 "    %s                    "+ //     match 'quotedstring'                 "  )*                      "+ //   end group 1 , repeat 0 or more times                 "  %s*                     "+ //   match 'otherthanquote'                 "  $                       "+ // match end of string                 ")                         ", // stop positive ahead                 otherthanquote, quotedstring, otherthanquote);          string[] tokens = line.split(regex, -1);         for(string t : tokens) {             system.out.println("> "+t);         }     } } 

which produces same first example.

edit

as mentioned @mikefhay in comments:

i prefer using guava's splitter, has saner defaults (see discussion above empty matches being trimmed string#split(), did:

splitter.on(pattern.compile(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")) 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -