Regex to remove text after angle brackets -
i trying write regex extract names email "from" header. had regex worked email clients noticed email client send header on different breaking regular expression. initial thought extract inside of double or single quotes not work anymore because not have quoted.
i using regular expression ([""'])(?:(?=(\\?))\2.)*?\1
extract text between quotes. think best course of action remove text inside of angle brackets leaving me "testing person" without quotes , preferably without second occurrence after comma although not necessary.
below 2 strings trying extract names from:
testing person <testing.person@example.com>,testing person <testing.person@example.com> "testing person" <testing.person@example.com>,"testing person" <testing.person@example.com>
i tried using can't seem figure out how tell how capture first half of string angle bracket (?!([^<|>])).*
any appreciated
you can use positive ahead, take name before < char. e.g. q(?=u) means match q followed u. following example, take names before <. handles quotes , white spaces.
example:
string pattern = @"([\w]+[\w\s]*)(?=[\'""\s]*<{1})"; var matches = regex.matches( "testing person <testing.person@example.com>, testing person <testing.person@example.com>, \"testing person\" <testing.person@example.com>, 'testing person' <testing.person@example.com>", pattern);
explanation:
{1} : 1 occurrence
*: 0 or more occurrence
+: 1 or more occurrence
\w: alphanumeric
\s: white space
[]: define range
[\'""\s]: single quote, double quote , white space accepted in range, \ escape char
x(?=<): match x comes before <
x(?=[\'""\s]*<{1}): matches x followed 1 occurrence of <, there 0 or more single quote, double quote or white space before <
([\w]+[\w\s]*): 1 or more alphanumeric followed 0 or more alphanumeric or white space. have added [\w]+ ensure not match empty strings.
you can have here explanation positive ahead: http://www.regular-expressions.info/lookaround.html
Comments
Post a Comment